Check your prior posterior overlap (PPO) – MCMC wrangling in R made easy with `MCMCvis`

When fitting a Bayesian model using MCMC (often via JAGS/BUGS/Stan), a number of checks are typically performed to make sure your model is worth interpreting without further manipulation (remember: all models are wrong, some are useful!):

  • R-hat (AKA Gelman-Rubin statistic) – used to assess convergence of chains in the model
  • Visual assessment of chains – used to assess whether posterior chains mixed well (convergence)
  • Visual assessment of posterior distribution shape – used to determine if the posterior distribution is constrained
  • Posterior predictive check (predicting data using estimated parameters) – used to make sure that the model can generate the data used in the model

PPO

One check, however, is often missing: a robust assessment of the degree to which the prior is informing the posterior distribution. Substantial influence of the prior on the posterior may not be apparent through the use of R-hat and visual checks alone. Version 0.9.2 of MCMCvis (now available on CRAN), makes quantifying and plotting the prior posterior overlap (PPO) simple.

MCMCvis is an R package designed to streamline analysis of Bayesian model results derived from MCMC samplers (e.g., JAGS, BUGS, Stan). It can be used to easily visualize, manipulate, and summarize MCMC output. The newest version is full of new features – a full tutorial can be found here.

An example

To check PPO for a model, we will use the function MCMCtrace. As the function is used to generate trace and density plots, checking for PPO is barely more work than just doing the routine checks that one would ordinarily perform. The function plots trace plots on the left and density plots for both the posterior (black) and prior (red) distributions on the right. The function calculates the percent overlap between the prior and posterior and prints this value on the plot. See ?MCMCtrace in R for details regarding the syntax.

#install package
install.packages('MCMCvis', repos = "http://cran.case.edu")

#load package
require(MCMCvis)

#load example data
data(MCMC_data)

#simulate data from the prior used in your model
#number of iterations should equal the number of draws times the number of chains (although the function will adjust if the correct number of iterations is not specified)
#in JAGS: parameter ~ dnorm(0, 0.001)
PR <- rnorm(15000, 0, 32)

#run the function for just beta parameters
MCMCtrace(MCMC_data, params = 'beta', priors = PR, pdf = FALSE)

unnamed-chunk-1-1

Why check?

Checking the PPO has particular utility when trying to determine if the parameters in your model are identifiable. If substantial PPO exists, the prior may simply be dictating the posterior distribution – the data may have little influence on the results. If a small degree of PPO exists, the data was informative enough to overcome the influence of the prior. In the field of ecology, nonidentifiability is a particular concern in some types of mark-recapture models. Gimenez (2009) developed quantitative guidelines to determine when parameters are robustly identifiable using PPO.

While a large degree of PPO is not always a bad thing (e.g., substantial prior knowledge about the system may result in very informative priors used in the model), it is important to know where data was and was not informative for parameter estimation. The degree of PPO that is acceptable for a particular model will depend on a great number of factors, and may be somewhat subjective (but see Gimenez [2009] for a less subjective case). Like other checks, PPO is just one of many tools to be used for model assessment. Finding substantial PPO when unexpected may suggest that further model manipulation is needed. Happy model building!

Other MCMCvis improvements

Check out the rest of the new package freatures, including the option to calculate the number of effective samples for each parameter, ability to take arguments in the form of a ‘regular expression’ for the params argument, ability to retain the structure of all parameters in model output (e.g., parameters specified as matrices in the model are summarized as matrices).

Follow Casey Youngflesh on Twitter @caseyyoungflesh. The MCMCvis source code can be found on GitHub.

New paper out in Nature Communications

The study, led by Lynch Lab postdoc Chris Che-Castaldo, highlights the need to aggregate abundance estimates over space to produce robust estimates of abundance when substantial stochasticity exists in populations. Adélie penguin population dynamics are inherently noisy, making it difficult to separate signal from noise when using these birds as indicators for environmental change. Nearly the entire global population of Adélie penguins was modeled in this effort, using every piece of publicly available data on Adélie penguin abundance. All code and data (instructions on how to query the database) to run the analyses available in the supplements! Check out the MAPPPD website to interact with the model results and check out penguin population dynamics for yourself.

s41467-017-00890-0

Congrats to Bento and Catie!

Congratulations to Bento Goncalves, who was recently announced as one of the 2017 IACS Jr. Research Award Winners. This award will help support Bento’s thesis research on using ‘deep learning’ for pack ice seal surveys. Congratulations as well to Catie Foley, who was one of the winners of the 1st Annual STRIDE visualization contest. Nice work!

High School Student Research Highlighted at Annual Stony Brook Women in Science & Engineering Event

Lynch Lab Ph.D. candidate Catie Foley has spent this academic year mentoring seniors from Patchogue-Medford High School enrolled in Stony Brook’s High School Women in Science and Engineering (WISE) Program. The group met weekly to discuss population & spatial ecology and the students designed independent projects ranging from habitat modeling of seabirds to photo identification of seals. This week, the student’s research was highlighted during a poster session at Stony Brook’s annual High School WISE Capstone Ceremony.
Congratulations, ladies!
20170425_171508

Lynch Lab Represents at Stony Brook Undergraduate Research & Creativity Symposium

This week, six undergraduates from the Lynch Lab presented their research at Stony Brook’s Undergraduate Research & Creative Activities (URECA) Symposium:

Adaptive Significance of King Penguin (Aptenodytes patagonicus) Crèches
Lisa Caligiuri, Catherine Foley, and Heather Lynch

Variation in Population Dynamics of King Penguins, Aptenodytes patagonicus, Across Phylogenetic and Regional Scales
Vanessa Kennelly, Maureen Lynch, Catherine Foley, and Heather Lynch

Variation in the ecstatic display call of the gentoo penguin (Pygoscelis papua) associated with behavioral responses
Medha Pandey, Maureen Lynch, and Heather Lynch

Climate indices explain variation in fur seal pup mortality
Katla Thorsen, Casey Youngflesh, and Heather Lynch

The Effect of Oceanographic Conditions on Pygoscelis Penguin Population Dynamics
Arianna West, Catherine Foley, Heather Lynch

Phylogenetic Relationships between Conservation Risk and Life History Traits in Seabirds
Helen Wong, Maureen Lynch, Heather Lynch

Congratulations ​to each of these outstanding students!

20170426_131802.jpg20170426_152316IMG_2575IMG_258020170426_133108IMG_2571

Lynch Lab Undergraduate Student Wins Summer Research Award

The Lynch Lab is proud to announce that Sara Vincent, an undergraduate student working in the lab, has been awarded Stony Brook’s 2017 Undergraduate Research and Creative Activities (URECA) Biology Alumni Research Award. With the receipt of this award, Sara will spend the summer in the lab working on her independent project examining the spatial patterns of elephant seal harems.

Congratulations, Sara!

Visualizing and wrangling MCMC output in R with `MCMCvis`

Model results can be thought of as a reward for the many hours of model design, troubleshooting, re-design, etc. that analyses often require. Following the potentially exhausting mental exercise to acquire these results, I think we’d all like the interpretation to be as straightforward as possible. Analyzing MCMC output from Bayesian analyses, which may include hundreds of parameters and/or derived quantities, however can often require a fair amount of code and (more importantly) time.

The MCMCvis package was designed to alleviate this problem, and streamline the analysis of Bayesian model results. The latest version (0.7.1) is now available on CRAN with bug fixes, speed improvements, and added functionality.

 

Why MCMCvis?

Using MCMCvis provides three principal benefits:

1) MCMC output fit with Stan, JAGS, or other MCMC samplers can be fed into all package functions as an argument with no further manipulation needed. No need to specify the type of object or how it was fit; the package does all of that behind the scenes.

2) Specific parameters or derived quantities of interest can be specified within each function, to avoid additional steps of data processing. This works using a grep like call for optimal efficiency.

3) The package creates ‘publication-ready’ posterior estimate visualizations (below). Parameters can now be plotted vertically or horizontally.

rplot

The package has four functions for basic MCMC output tasks:

MCMCsummary – summarize MCMC output for particular parameters of interest

MCMCtrace – create trace and density plots of MCMC chains for particular parameters of interest

MCMCchains – easily extract posterior chains from MCMC output for particular parameters of interest

MCMCplot – create caterpillar plots from MCMC output for particular parameters of interest

The vignette can be found here.

 

An example workflow may go as follows:

– summarize posterior estimates for just beta parameters

#install package
install.packages('MCMCvis')

#load package
require(MCMCvis)

#load example data
data(MCMC_data)

#run summary function
MCMCsummary(MCMC_data, 
             params = 'beta')
##            mean   2.5%    50% 97.5% Rhat
## beta[1]    0.16   0.06   0.15  0.25    1
## beta[2]   -7.77 -25.82  -7.68  9.78    1
## beta[3]   -5.64 -28.53  -5.76 17.23    1
## beta[4]  -10.39 -25.98 -10.63  5.27    1
## beta[5]    7.52   6.03   7.52  9.05    1
## beta[6]   10.89  10.10  10.89 11.68    1
## beta[7]   -1.91  -4.83  -1.92  1.08    1
## beta[8]    5.38  -6.86   5.45 17.67    1
## beta[9]   13.39   3.28  13.38 23.60    1
## beta[10]  17.63  14.41  17.63 20.86    1

– check posteriors for convergence

MCMCtrace(MCMC_data, 
           params = c('beta[1]', 'beta[2]', 'beta[3]'), 
           ind = TRUE)

rplot01

– extract chains for beta parameters so that they can be manipulated directly

ex <- MCMCchains(MCMC_data, params = 'beta')
#find 22nd quantile for all beta parameters
apply(ex, 2, function(x){round(quantile(x, probs = 0.22), digits = 2)})
##  beta[1]  beta[2]  beta[3]  beta[4]  beta[5]  beta[6]  beta[7]  beta[8]  beta[9] beta[10] 
##    0.12   -14.86   -14.80   -16.48     6.91    10.58    -3.09     0.68     9.29    16.36

– create caterpillar plots for posterior estimates. Shading represents whether 50% CI (gray with open circle), 95% CI (gray with closed circle), or neither (black) overlap 0. This option can be turned off (as below). A variety of options exist, including the ability to plot posteriors vertically rather than horizontally

MCMCplot(MCMC_data, 
          params = 'beta', 
          horiz = FALSE,
          rank = TRUE,
          ref_ovl = FALSE,
          xlab = 'My x-axis label', 
          main = 'MCMCvis plot', 
          labels = c('First param', 'Second param', 'Third param', 
          'Fourth param', 'Fifth param', 'Sixth param', 'Seventh param', 
          'Eighth param', 'Nineth param', 'Tenth param'), 
          labels_sz = 1.5, med_sz = 2, thick_sz = 7, thin_sz = 3, 
          ax_sz = 4, main_text_sz = 2)

 rplot03

Follow Casey Youngflesh on Twitter @caseyyoungflesh. The MCMCvis source code can be found on GitHub.