metacoder
metacoder copied to clipboard
Metacoder continuous variable function for longitudinal dataset
Hi Zach, I have a shotgun metagenomic sequencing dataset consisting of 17 patients that have daily vaginal microbiome samples for a period of 21 days post surgical obstetric fistula repair, and I want to visualize the change in relative abundance across time using metacoder. The compare groups functionality wont work because the amount of time points are too many to adequately display a correlation plot, and I don't think the grouping the samples by week indicates an accurate depiction of the changes, which led me to this idea.
I want to use the rate of change in the relative abundance of the dataset as a node color for a heat tree. For example, if plot the days post operation on the x axis and the relative abundance of the taxa on the y, I can get a linear trendline (y=mx+b) for each species (the abundances are the clade markers hits coming out of metaphlan2 imported as a phyloseq object).
I would like to use the slope of the trendline (x) as the color for the node, which should depict the change in relative abundance overall for all the samples and all the time points. Is there anything in the metacoder or taxa packages that will allow me to accomplish this?
Here is the code I have written so far, and here is the starting phyloseq object that I am using for the analysis. Any guidance that you might have for me would be extremely helpful.
Thanks, Michael
Update!
I converted the phyloseq to a taxmap object, then plotted each tax rank in order to get the slope of a line, for which I used as a heat map color in the heat tree to show the rate of change.
This worked beautifully (after some troubleshooting), but the figure is quite busy, so I decided to filter the taxa that only have the greatest rate of change (via the absolute value of the slope) for a downstream figure.
Here is where I am stuck. I have a list called filt
that contains the taxa of interest, but when I go to filter the taxa in the taxmap object, the results are not the taxa of interest.
For instance:
here is the original taxmap object
> obj
<Taxmap>
346 taxa: cs. Acidaminococcaceae, gg. Acinetobacter ... br. Verrucomicrobiales
346 edges: bg->cs, dd->gg, gg->nb, ab->ac, ac->ak ... bg->ct, ab->aj, br->dh, aj->ax, ax->br
here is the filtering list with the tax_ranks as factors
> filt
tax_rank slope abb
1 Bacilli 0.6605342 am
2 Bifidobacteriaceae 0.2054027 bv
3 Bifidobacteriales 0.2054027 az
4 Enterobacteriaceae -0.7340206 db
5 Enterobacteriales -0.7340206 bm
6 Escherichia -0.6560225 fz
7 Escherichia_coli -0.6181949 mo
8 Firmicutes 0.4778686 ae
9 Gammaproteobacteria -0.8955581 au
10 Gardnerella 0.2890978 dq
11 Gardnerella_vaginalis 0.2890978 hb
12 Lactobacillaceae 0.6117220 ch
13 Lactobacillales 0.6520766 bd
14 Lactobacillus 0.6117220 ei
15 Lactobacillus_iners 0.4247242 jj
16 Prevotella 0.3280083 eb
17 Prevotella_intermedia 0.2083886 in
18 Prevotellaceae 0.3389241 ca
19 Proteobacteria -0.8565714 ag
here is the command to filter the taxa
obj_filt<-filter_taxa(obj, taxon_names=filt$tax_rank,reassign_taxa = F,supertaxa = T, subtaxa = F, invert = F)
and here is the output
obj_filt
<Taxmap>
49 taxa: ac. Actinobacteria, ak. Actinobacteria ... dn. Propionimicrobium, ag. Proteobacteria
49 edges: ab->ac, ac->ak, ak->ay, am->bc, bc->cc, ae->am ... eb->ir, bb->ca, ay->bu, bu->dn, ab->ag
and here is the names of the filtered object that dont match the filt list
taxon_names(obj_filt)
ac ak
"Actinobacteria" "Actinobacteria"
ay bc
"Actinomycetales" "Bacillales"
cc am
"Bacillales_noname" "Bacilli"
ab bb
"Bacteria" "Bacteroidales"
ad al
"Bacteroidetes" "Bacteroidia"
dv hs
"Barnesiella" "Barnesiella_intestinihominis"
ar an
"Betaproteobacteria" "Clostridia"
be cl
"Clostridiales" "Clostridiales_Family_XI_Incertae_Sedis"
fy mn
"Enterobacter" "Enterobacter_cloacae"
db bm
"Enterobacteriaceae" "Enterobacteriales"
bf ao
"Erysipelotrichales" "Erysipelotrichia"
eo ae
"Finegoldia" "Firmicutes"
af cu
"Fusobacteria" "Fusobacteriaceae"
bh aq
"Fusobacteriales" "Fusobacteriia"
fq md
"Fusobacterium" "Fusobacterium_nucleatum"
me au
"Fusobacterium_periodonticum" "Gammaproteobacteria"
ga ms
"Klebsiella" "Klebsiella_unclassified"
cn ch
"Lachnospiraceae" "Lactobacillaceae"
bd ei
"Lactobacillales" "Lactobacillus"
jh bz
"Lactobacillus_crispatus" "Porphyromonadaceae"
dy ia
"Porphyromonas" "Porphyromonas_somerae"
eb il
"Prevotella" "Prevotella_copri"
ir ca
"Prevotella_stercorea" "Prevotellaceae"
bu dn
"Propionibacteriaceae" "Propionimicrobium"
ag
"Proteobacteria"
Anyways sorry for the book, I'm just really at a loss here as to why the filtering step isnt working properly
Hello, sorry I did not respond to your first post. I am busy preparing to defend my thesis in the next few weeks and I overlooked this issue. Does this do what you want?
filter_taxa(obj, taxon_names %in% filt$tax_rank, supertaxa = TRUE)
Great idea for a plot by the way. A while ago I thought plotting variables in models fit to the data for each taxon would be cool, but I have never done it.