phyloseq icon indicating copy to clipboard operation
phyloseq copied to clipboard

Phyloseq silently dropping samples when making phyloseq object

Open wlwhalley opened this issue 4 years ago • 6 comments

Hi,

I'm finding as the title. The seqtab object has 75 samples and the phyloseq object has 69.

The code is pretty simple: fitGTR<-readRDS("C:/VMshared/analysis/treeits_prefilt.rds") seqtab<-readRDS("C:/VMshared/analysis/seqtab_ITSpair_final1.rds") tax<-readRDS("C:/VMshared/analysis/tax_ITSpair_final1.rds") metadata_ITS <- read.csv("C:/VMshared/master/master.csv", row.names=1) tax<-as.matrix(tax)

ps <- phyloseq(otu_table(seqtab, taxa_are_rows=FALSE), sample_data(metadata_ITS), tax_table(tax),phy_tree(fitGTR$tree)) ps

I am wondering if it's something to do with the sample naming: c("E10Y1", "E10Y2", "E10Y3", "ED1", "ED2", "ED3", "EI1", "EI2", "EI3", "MF1", "MF2", "MF3", "MF4", "MH1", "MH2", "MH3", "MM1", "MM2", "MM3", "MM4", "MU1", "MU2", "MU3", "MU4", "NF1", "NF2", "NF3", "NF4", "NM1", "NM2", "NM3", "NM4", "NU1", "NU2", "NU3", "NU4", "P10Y1", "P10Y2", "P10Y3", "PD1", "PD2", "PD3", "PI1", "PI2", "PI3", "S10Y1", "S10Y2", "S10Y3", "S5Y1", "S5Y2", "S5Y3", "SD1", "SD2", "SD3", "SI1", "SI2", "SI3", "WF1", "WF2", "WF3", "WF4", "WM1", "WM2", "WM3", "WM4", "WU1", "WU2", "WU3", "WU4" )

The samples missing are S5Y1, S5Y2, S5Y3, P5Y1, P5Y2, P5Y3, E5Y1, E5Y2, E5Y3. Obviously, it is something to do with the names of those samples, but I would be interested to know why so that I can avoid the issue for future experimental plans!

wlwhalley avatar Jun 08 '20 07:06 wlwhalley

Same problem here, went from 18 to 17 samples without any idea why.

I also thought about a naming issue, checked again and again, but can't fine what exactly it's about.

c("NicoLJ1", "NicoLJ2", "NicoLJ3", "NicoLJ4", "NicoLJ5", "NicoLJ6", "NicoLJ7", "Nico1", "Nico2", "Nico3", "Nico4", "Nico5", "Nico6", "Rob1", "Rob2", "Rob3", "Rob4", "Rob5")

rsiani avatar Jun 09 '20 19:06 rsiani

Seemingly the problem vanished. I am just shooting in the dark at this point, but might be some conflict with other libraries loaded? I restarted my session and then it worked again.

rsiani avatar Jun 09 '20 20:06 rsiani

No, I have retried with several changes to libraries etc and the problem persists. It's a shame as I won't be able to use phyloseq

wlwhalley avatar Jun 21 '20 23:06 wlwhalley

@wlwhalley Have you cross-checked the data to make sure all samples IDs match across all data? I found this occurs when I generate an initial phyloseq object (w/ counts, ranks, tree) then add in sample data later w/o some of the samples included; vaguely recall also having seen similar w/ sample IDs that start with a number (I now prefix those with a 'Sample-' or similar).

cjfields avatar Jul 05 '20 19:07 cjfields

I have the same problem when making a phyloseq object one sample is lost. Do I need to just rename that one sample? Sample names are: "AxBT01","AxBT03","AxBT05","AxBT08", "AxBT10","AxBT18","AxBT36", "AxBT38", "AxBT43", "AxBT44", "AxBT47", "AxBT66","ExBT03", "EBT4", "LBT03", "LBT04", "LBT05","LBT07","LBT08","LBT10","LBT09", "LBT89","LBT90", "LBT91", "LBT92", "LBT93", "LBT94", "LBT96", "LBT97", "LBT98", "LBTL7","LBTL9A","LBTL10" And AxBT10 is dropped when making a phyloseq object.

fullertonhe avatar Dec 18 '20 16:12 fullertonhe

Not sure if anyone is still running into this issue, but I was and these solutions weren't fixing it for me. I ended up finding that 8 of my samples had a space at the end of the sample name and phyloseq drops them because of that. So if none of the above solutions worked for you, check for spaces!

kld93 avatar Jun 21 '23 17:06 kld93