phyloseq Temporal Autocorrelation, and other Time Series Methods for Ecological Distances

Hi All,

I'm trying to estimate beta diversity on water samples that were taken weekly over an entire year. I've plotted PCoA and NMDS plots and have run adonis tests for my variables: Season, MonthSampleTaken, Season*MonthSampleTaken, and Location.

The results of my adonis tests for MonthSampleTaken came back significant. I am now trying to figure out which months were significant, so would like to run adonis tests for January-February, Jan-Mar, Jan-Apr... etc for all the month interactions.

I'm having trouble figuring out how to subset my sample data so that I can run the adonis tests properly.

Here is an example of my data and mapping file:

head(Thesis_mapping) Sample Data: [6 samples by 11 sample variables]: X.SampleID BarcodeSequence LinkerPrimerSequence InputFileName MonthSampleTaken Location Salinity PE12 PE12 NA NA PE12.fasta July M9 26.0 PE15 PE15 NA NA PE15.fasta July M9 14.5 PE17 PE17 NA NA PE17.fasta July BB 17.0 PE18 PE18 NA NA PE18.fasta July M9 17.0 PE20 PE20 NA NA PE20.fasta August BB 22.5 PE21 PE21 NA NA PE21.fasta August M9 22.5 WaterTemperature Season Weather Date PE12 30 Wet Rain 7/10/13 PE15 30 Wet Rain 7/19/13 PE17 30 Wet Rain 7/23/13 PE18 30 Wet Rain 7/23/13 PE20 30 Wet Rain 8/3/13 PE21 30 Wet Rain 8/3/13

I tried running this command to subset the samples to just January and February, and while the command ran without errors it didn't subset to just the samples from those months:

Jan_Feb= subset_samples(Lauren.scale, MonthSampleTaken="January,February")

Jun 08 '15 13:06 locon833

Try subset_samples(Lauren.scale, MonthSampleTaken=="January" | MonthSampleTaken= ="February")

The | means "or"

You could also do

subset_samples(Lauren.scale, MonthSampleTaken %in% c("January", "February"))

On Monday, June 8, 2015, locon833 <[email protected] javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:

Hi All,

I'm trying to estimate beta diversity on water samples that were taken weekly over an entire year. I've plotted PCoA and NMDS plots and have run adonis tests for my variables: Season, MonthSampleTaken, Season*MonthSampleTaken, and Location.

The results of my adonis tests for MonthSampleTaken came back significant. I am now trying to figure out which months were significant, so would like to run adonis tests for January-February, Jan-Mar, Jan-Apr... etc for all the month interactions.

I'm having trouble figuring out how to subset my sample data so that I can run the adonis tests properly.

Here is an example of my data and mapping file:

head(Thesis_mapping) Sample Data: [6 samples by 11 sample variables]: X.SampleID BarcodeSequence LinkerPrimerSequence InputFileName MonthSampleTaken Location Salinity PE12 PE12 NA NA PE12.fasta July M9 26.0 PE15 PE15 NA NA PE15.fasta July M9 14.5 PE17 PE17 NA NA PE17.fasta July BB 17.0 PE18 PE18 NA NA PE18.fasta July M9 17.0 PE20 PE20 NA NA PE20.fasta August BB 22.5 PE21 PE21 NA NA PE21.fasta August M9 22.5 WaterTemperature Season Weather Date PE12 30 Wet Rain 7/10/13 PE15 30 Wet Rain 7/19/13 PE17 30 Wet Rain 7/23/13 PE18 30 Wet Rain 7/23/13 PE20 30 Wet Rain 8/3/13 PE21 30 Wet Rain 8/3/13

I tried running this command to subset the samples to just January and February, and while the command ran without errors it didn't subset to just the samples from those months:

Jan_Feb= subset_samples(Lauren.scale, MonthSampleTaken="January,February")

— Reply to this email directly or view it on GitHub https://github.com/joey711/phyloseq/issues/485.

Jun 08 '15 13:06 michberr

Hmmm...I would choose a mantel test and look at time decay if it were me for the months sample as you can make this a numerical variable here. Then do a separate test for the catoegorical variables using adonis to see if season and location have an effect.

The amount of time points in there it's not surprising you are coming up with something significant. Also if you were to take the approach of trying to figure out which time point is different for each you would need to correct your p-value for multiple comparisons using a bonferroni correction (p-value/number of comparisons) or another correction.

For the mantel test I would start by making your first time sample day 0 and then next month day 30, next month day 60, etc.

library(vegan) ##jaccard distances ##this is taking the site x species matrix (otu table) and determining distances between each set of sites sampled jacc <- distance(phyloseq_object, "jaccard")

###time matrix ###this makes a matrix with differences between each sample in time df_time <- as(sample_data(phyloseq_object), "data.frame") time.dist <- dist(df_time$Days, method = "euclidean") dist.matrix <- as.matrix(time.dist)

###there are a couple mantel tests, this is the one in vegan mantel(jacc, time.dist, method = "pearson")

plot(jacc ~ time.dist, xlab = "Distance between pairs in time (days)", ylab = "Jaccard dissimilarity") abline(lm(jacc~ time.dist))

I would think using a mantel test would be more informative if you have the expectations that over time there will be a decay in species similarity.

Jun 08 '15 15:06 CarlyMuletzWolz

Thanks @CarlyRae , I think your detailed answer might yield some useful information. I don't know of an "off the shelf" method for a more formal measurement of temporal autocorrelation of microbiome (ecological) distances.

Maybe someone can comment if they know of one?

I will leave this as a feature request for now. I think this functionality is going to be needed as more time-series datasets become available.

Jun 18 '15 18:06 joey711

I am working with time series microbiome data right now and am hoping to contribute some methods to phyloseq in the future.

For now, I will say that Cram et al 2014 in ISME is the best analysis I've seen so far of understanding seasonal and temporal trends. http://www.nature.com/ismej/journal/v9/n3/full/ismej2014153a.html

They use a mantel test to identify correlation between time and ecological dissimilarity. They also use a partial mantel to test for correlation between ecological dissimilarity and environmental variables while controlling for temporal variation. Finally, they used Generalized Additive Mixed Models to pull out which taxa show seasonal and temporal variability. I think there's a lot to work with there!

Jun 18 '15 19:06 michberr

You need to explicitly call the distance function from the phyloseq package i.e.

jacc <- phyloseq::distance(Lauren_data_merged,"jaccard")

On Mon, Jun 22, 2015 at 10:32 AM, locon833 [email protected] wrote:

Hi @CarlyRae https://github.com/CarlyRae

I am attempting to calculate jaccard distance so that I can do the Mantel test, but get an error message:

jacc<- distance(Lauren_data_merged, "jaccard") Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘distance’ for signature ‘"phyloseq", "character"’

Any ideas on this?

— Reply to this email directly or view it on GitHub https://github.com/joey711/phyloseq/issues/485#issuecomment-114131260.

Jun 22 '15 15:06 michberr

@michberr , yes thank you! I realized that after I had posted.

Thank you, much!

Jun 22 '15 15:06 locon833

@michberr I am trying to wrap my mind around the use of Mantel test. Guillot and Rousset in 2013 showed that Mantel test should not be used in presence of spacial or temporal autocorrelation. Do you know if Cram et al. 2014 tested for it?

Nov 22 '23 08:11 RosarioIacono

phyloseq phyloseq copied to clipboard

Temporal Autocorrelation, and other Time Series Methods for Ecological Distances

phyloseq
phyloseq copied to clipboard