MicrobiotaProcess icon indicating copy to clipboard operation
MicrobiotaProcess copied to clipboard

Get the taxonomic information of same genus?

Open BinhongLiu opened this issue 4 years ago • 6 comments

Hi, I'm trying to get the abundance table at genus level with the get_taxadf function. However, there might be some genera that having the same names (e.g. g__Clostridium from different families). Is there any method to obtain the corresponding higher level taxonomic information for the genera with same names? Many thanks! Best regards! Hongbin Liu

BinhongLiu avatar Aug 06 '20 13:08 BinhongLiu

Good question! Sometimes the taxa names are promiscuous and omissive. To solve the problem, we developed fillNAtax, which can automatically extract their upper level taxonomic information to complete it. as an example.

dt <- data.frame(
    k=c("B","B","A","A"), 
    p=c("C","C","C","D"),
    c=c("E","F","G","H"),
    o=c("I","J","K","L"),
    f=c("M","N","O",NA), 
    g=c("R","R","S","T"))

the original data.frame. C and R have different upper levels.

  k p c o f g
1 B C E I M R
2 B C F J N R
3 A C G K O S
4 A D H L P T

The fillNAtax is an internal function.

MicrobiotaProcess:::fillNAtax(dt)

It will output the following, C and R have been added to upper levels. And the NA also has been added with f__un_o__L.

     k         p    c    o          f         g
3 k__A p__C_k__A c__G o__K       f__O      g__S
1 k__B p__C_k__B c__E o__I       f__M g__R_f__M
2 k__B p__C_k__B c__F o__J       f__N g__R_f__N
4 k__A      p__D c__H o__L f__un_o__L      g__T

The function has been applied before using get_taxadf. So if you want to get specific genera. You can use the following code to extract it.

gda <- get_taxadf(yourps, taxlevel=6)
gda <- as.data.frame(gda@otu_table)
g <- gda[grep("g__Clostridium", rownames(gda)),]

If there are the same genus name, but different family, all results will return.

xiangpin avatar Aug 06 '20 16:08 xiangpin

That's really a nice function! So I would not get a table containing genera (or higher taxonomic level) with same name if I use the get_taxadf function, right? I just checked my genus table derived from the get_taxadf function and found that the genera' names would automatically contain the upper level taxonomic information untill it encounter a different name. What if I need to get the whole taxonomic information (e.g. for genus level table, name from rank 6 to rank 1; for family level table, name from rank 5 to rank 1; for order level table, name from rank 4 to rank1)? Thanks for your help! Best regards! Hongbin Liu

BinhongLiu avatar Aug 07 '20 01:08 BinhongLiu

Yes, each taxanomy name will be unique. If you want to get the full taxonomic information, I think you can extract it from the origianl taxatable. You can using head(ps@tax_table) or phyloseq::tax_table(ps) to view. And next version of MicrobiotaProcess, we will provide the tax_table to the result of get_taxadf.

xiangpin avatar Aug 07 '20 03:08 xiangpin

Yes! The full taxonomic information could be obtained through the phyloseq::tax_table(ps) dunction, but I'm not sure whether the phyloseq::tax_table(ps) have a similar function like fillNAtax. Through tax_table, I think I'll get a taxa table contain genera with same names as I metioned above. Now I have the abundance table at genus level derived from the get_taxadf function and perform the differential analysis on this table. I want to know the full taxonomic information of these differential taxa. I think the tax_table function designed for the result of get_taxadf you metioned would solve my problem. I'm urging to have the next version of MicrobiotaProcess! Thank you for your help and explaination! Hongbin Liu

BinhongLiu avatar Aug 07 '20 03:08 BinhongLiu

  • The tax_table of phyloseq does not similar funtion.
  • The github version has supported it. You can install it by github.

xiangpin avatar Aug 07 '20 04:08 xiangpin

Great! The development version solves my problem perfectly! Thanks!

BinhongLiu avatar Aug 07 '20 04:08 BinhongLiu