MixSIAR icon indicating copy to clipboard operation
MixSIAR copied to clipboard

Unexpected Individual diet estimates

Open VirginiaMorera opened this issue 4 years ago • 3 comments

Hi,

I am trying to reconstruct the diet of gulls (3 source groups as means and SD, and 3 isotopes) classified in 10 groups (years). Individuals are not repeatedly tracked, they are different individuals every year, and I wanted to find a diet estimate for the "population" each year (I used "year" as fixed factor and both process and residual error) and then for each individual (I used "individual" as fixed factor and only process error).

This is what the isospace plots look like isospace_plot_1_2 isospace_plot_1_3 isospace_plot_2_3

The analysis by year looks fine to me, ran with "normal" length diagnostics look OK and the proportions of diet are according to our expectations imagen

However, when I perform the analysis by individual ("normal" length too), median of the posterior estimates for each source and each individual (plotted here by year to ease interpretation, but analysed by individual) look like this: imagen And the global proportions of the Ind as fixed factor model look like this imagen

I suspect what's causinig the difference in outputs (seeing it's the same consumer, sources, and discr data) is the pairwise correlation between sources. In the year as factor model they look like this pairs_plot_year which is not perfect, but not too bad either

But in the ind as factor model, they look like this pairs_plot_ind

I suspect the strong correlation between the Refuse and Terrestrial sources is wreacking havoc in the posterior estimates, and my questions are:

  • why is there such a big difference in correlations when running the model with individual as factor and with year as factor?

  • Is there a way I can improve performance of the Ind as factor model, besides increasing the model run to "extreme"? Would that help, in fact?

Thanks for your help!

VirginiaMorera avatar Mar 10 '20 11:03 VirginiaMorera

As an update, I ran an "extreme" run of the same model with individual as factor and the results looked almost exactly the same: imagen imagen imagen

VirginiaMorera avatar Mar 12 '20 10:03 VirginiaMorera

Hi Virginia,

A couple thoughts:

  1. Process only error structure for single datapoints is not multivariate and does not account for correlation in the consumer isotope values. Looks like this is non-trivial for d34S and d13C.

  2. Since you have several levels of Year and Ind (within Year), I'd recommend using hierarchical random effects. Have you run a model with Year and Ind as random effects, Ind nested within Year, and process x resid error? That's what I'd think of as the most natural model for your data and questions (interested in Year and Ind estimates, possibly also the relative importance of Year vs. Ind). This will estimate far fewer parameters and increase the power of your data. You can compare to models with no re, just Year as re, and just Ind as re (keeping the same process x resid error structure for all).

  3. Re: what's up with the Ind as fixed effects + process error model? My guess is that there is enough uncertainty in the source fitting that the Terrestrial mean is being estimated more on top of the consumer data (i.e. lower N, higher S, and higher C than the Terrestrial sample mean, which is what's plotted), and can therefore have proportions approaching 1. If you wanted to test if this is the case, artificially reduce the source and TDF SD and increase the source n. This will effectively fix the source means at their sample means. And re: why is the Refuse-Terrestrial p correlation higher? If the Terrestrial source is moved on top of the consumer data in isospace, then Refuse is confounded with Terrestrial (i.e. the Terrestrial p must go down for Refuse p to go up).

brianstock avatar Mar 12 '20 14:03 brianstock

Hi Brian, thanks for the insight!

A few things I've tried:

Ind as fixed effects + process error model I've done as you suggested and increased the Terrestrial n artificially to match the n of the other sources (original ns were 60, 54, 10, so Terrestrial was truly underrepresented), and reduced SD in both TDF and sources, because only reducing it in TDF didn't make any significant change). This is how the isospace plots look now: imagen imagen imagen

This has changed the output significantly, now Terrestrial seems to make almost no contribution to the diet of individuals, but proportions don't match those of the "year as factor" model either. imagen

These changes are likely due to changes in correlations among sources, with very high correlation now between terrestrial and marine... imagen

Should I assume that geometry is not my friend and this is unfixable?

Ind and year as nested random factors and process+resid rerror I've followed your suggestion and run this model. Individual results improve over the model with only individual as fixed factor. However, there are still some weird results, particularly for years 2005, 2009 and 2010 (96, 16 and 50 samples respectively), and also marine contribution appears to be much lower than expected according to diet studies: imagen

And in addition, yearly proportions are estimated (I guess) by somehow combining individual proportions for each year, so now I have worse estimates for yearly proportions that I had with only the year as fixed model. However, this model with random nested effects has the "best" pairs plot of all the ones tested so far... imagen

I'm going to try the nested model with the modified SDs and N for the sources, and see what happens. However, that can serve as a "diagnostic", but is not a permanent fix, since the changes are artificial... any idea of a permanent solution to this?

Thank you very much for your help!

VirginiaMorera avatar Mar 16 '20 18:03 VirginiaMorera