microeco
microeco copied to clipboard
Tax4Fun2 ASV annotations and FAPROTAX issues
Hello,
I have a quick question about tax4fun2. Is it possible to generate a file that has the KO annotations for each individual ASV/OTU in my dataset? Ideally I would like to be able to do things like subset all of my ASVs that are predicted to be capable of nitrogen cycling and examine how the N-cycling community and their metabolic potential change in each sample over time. I think this would be a very powerful and useful feature for microeco to have.
I was able to do this using the output for FAPROTAX, however I ran into a big issue with the program. The database may not have been updated recently, so I get drastically different metabolic results if I generate my taxonomy with the current Silva v138.1 database vs a Silva database from 2020. For example, in the 2020 version I have several ammonia oxidizing bacteria that get annotated, and in the more recent 138.1 version they don't show up- it appears there are only ammonia oxidizing archaea in my samples. Just thought I should point this out in case you'd like to add a disclaimer.
Thank you so much for your help and all of your hard work creating this fantastic R package! I have been recommending microeco to everyone.
Jake Callaghan
Hi Jake @calla404,
Very good idea! I have also thought of this point recently. Which version of microeco are you using? I have added this output file in version 0.9.0. The file name is "res_tax4fun2_reference_profile.tsv" in your temporary directory assigned in the parameter path_to_temp_folder. Please try to check it. If path_to_temp_folder is NULL, please find the temporary in system according to the messages. Thanks for letting me know this difference. The updating of FAPROTAX is very slow because it is a hand-curated database depending on the past papers and books. So the taxonomic names in it have a very low probability to be updated. The taxonomic names is a big question in almost all biology field. I remember there is also an issue about the Funguild referring to the difference of taxa names between UNITE and NCBI. Now, back to the question. Could you paste several examples of ammonia oxidizing bacteria that not annotated and shown up. I will check it manually and think about how to make it better solved. Thanks for your finding and good suggestion.
Best, Chi
Hi Chi,
Thanks so much for your prompt response! I was previously using microeco v0.8.0. After updating to v0.9.0 I re-ran tax4fun2 and the tsv file was successfully generated. I was expecting a presence/absence matrix of 1's and 0's like from FAPROTAX, but my output file has a lot of decimal places. For example, 0.3333, 0.142857143, and 0.5 are some of the values I see. What do these numbers correspond to?
As far as the ammonia oxidizing bacteria go, it looks like it is several members of the Nitrosomonadaceae that get flagged for aerobic ammonia oxidation using the taxonomy sheet from 2020 but not using the current 138.1 database. However that's very odd because it doesn't look like that name has changed in the latest silva release. I'll have to go back through my script and make sure I don't have an error in my code that could have lead to this.
Thanks again, Jake
Hi Jake @calla404,
The decimal in that file comes from two reasons. One is the KO annoation result for a protein seq may be multiple, i.e. one sequence may have two or more KO. Another is the normalization by 16S copy number. So those numbers are the results after the adjustment. Note that different KO annotation tools could generate distinct results. So if you try to change use_uproc = F in the cal_tax4fun2 function, the results can also change. If you only intrested in the 0/1 data, you can directly change the values > 0 to 1. Ok. The annotation tools, params and databases can all generate differences on the results. Please feel free to tell me if the question still occur after you check the annotation step.
Chi
Hi Chi, Thanks for microeco.
It is possible to perform the trans_diff command to the output of the FAPROTAX functional annotation of samples, as described for tax4fun2?
Thanks in advance.
Hi @carloshenriquezc Yes. Good question! Something like this.
library(microeco)
data(dataset)
t1 <- trans_func$new(dataset)
t1$cal_spe_func()
t1$cal_spe_func_perc()
t1$cal_spe_func_perc(abundance_weighted = TRUE)
# t1$res_spe_func_perc is the result we require
# first create microtable object, which is necessary
m1 <- microtable$new(as.data.frame(t(t1$res_spe_func_perc)), sample_table = dataset$sample_table)
# because we donnot have taxonomy table, we directly generate an abundance table in taxa_abund list, which is necessary for the input of trans_diff
m1$taxa_abund$OTU <- m1$otu_table
# use temporal OTU as taxa_level to do trans_diff
t1 <- trans_diff$new(dataset = m1, method = "wilcox", group = "Group", taxa_level = "OTU")
t1$plot_diff_abund(use_number = 1:20, add_sig = T)
# other methods also available , please see the tutorial
Sure. Using trans_env is also ok and easier as there has been the similar differential tests in trans_env class.
Best, Chi
Hi Chi,
Thanks for your support.
I have another doubt: There is a way to show the functional plot by group of samples.
I tried using this:
dataset$merge_samples(use_group = "S_T") t2 <- trans_func$new(dataset) t2$cal_spe_func(prok_database = "FAPROTAX") t2$cal_spe_func_perc(abundance_weighted = FALSE) t2$plot_spe_func_perc()
But the graph shows by sample functional traits.
Thanks in advance!
Carlos Henríquez C. PhD. Investigador Titular Centro de Estudios Avanzados en Zonas Áridas (CEAZA) Laboratorio de Fisiología y Genética Marina (FIGEMA)
Fono: (+56 51) 2673262
Larrondo 1281, Coquimbo Chile
On Mon, Nov 14, 2022 at 11:15 PM Chi Liu @.***> wrote:
Hi @carloshenriquezc https://github.com/carloshenriquezc Yes. Good question! Something like this.
library(microeco) data(dataset) t1 <- trans_func$new(dataset) t1$cal_spe_func() t1$cal_spe_func_perc() t1$cal_spe_func_perc(abundance_weighted = TRUE)
t1$res_spe_func_perc is the result we require
first create microtable object, which is necessary
m1 <- microtable$new(as.data.frame(t(t1$res_spe_func_perc)), sample_table = dataset$sample_table)
because we donnot have taxonomy table, we directly generate an abundance table in taxa_abund list, which is necessary for the input of trans_diff
m1$taxa_abund$OTU <- m1$otu_table
use temporal OTU as taxa_level to do trans_diff
t1 <- trans_diff$new(dataset = m1, method = "wilcox", group = "Group", taxa_level = "OTU") t1$plot_diff_abund(use_number = 1:20, add_sig = T)
other methods also available , please see the tutorial
Sure. Using trans_env is also ok and easier as there has been the similar differential tests in trans_env class.
Best, Chi
— Reply to this email directly, view it on GitHub https://github.com/ChiLiubio/microeco/issues/111#issuecomment-1314665960, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKTMCQRUJ6NFO32URNVNRADWILW3DANCNFSM5W54AZFA . You are receiving this because you were mentioned.Message ID: @.***>
Hi. The first line generates a new data that should be assigned with a name. Otherwise, it only be printed
d1 <- dataset$merge_samples(use_group = "S_T")
t2 <- trans_func$new(d1)
t2$cal_spe_func(prok_database = "FAPROTAX")
t2$cal_spe_func_perc(abundance_weighted = FALSE)
t2$plot_spe_func_perc()
Chi
awesome
Thanks Chi!
Carlos Henríquez C. PhD. Investigador Titular Centro de Estudios Avanzados en Zonas Áridas (CEAZA) Laboratorio de Fisiología y Genética Marina (FIGEMA)
Fono: (+56 51) 2673262
Larrondo 1281, Coquimbo Chile
On Thu, Mar 16, 2023 at 11:35 AM Chi Liu @.***> wrote:
Hi. The first line generates a new data that should be assigned with a name. Otherwise, it only be printed
d1 <- dataset$merge_samples(use_group = "S_T") t2 <- trans_func$new(d1) t2$cal_spe_func(prok_database = "FAPROTAX") t2$cal_spe_func_perc(abundance_weighted = FALSE) t2$plot_spe_func_perc()
Chi
— Reply to this email directly, view it on GitHub https://github.com/ChiLiubio/microeco/issues/111#issuecomment-1472100566, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKTMCQW5P3JVHYWKLH4G7ZLW4MQMPANCNFSM5W54AZFA . You are receiving this because you were mentioned.Message ID: @.***>
Hi Chi, Microeco is getting more and more complete.
I have noticed that when I do differential abundance using FungalTraits two overlapping plots are generated. This does not happen if I use Funguild.
I attach both.
Another question. How can I specify the order of the samples without doing merge_samples
thank you very much again
Carlos Henríquez C. PhD. Investigador Titular Centro de Estudios Avanzados en Zonas Áridas (CEAZA) Laboratorio de Fisiología y Genética Marina (FIGEMA)
Fono: (+56 51) 2673262
Larrondo 1281, Coquimbo Chile
On Mon, Nov 14, 2022 at 11:15 PM Chi Liu @.***> wrote:
Hi @carloshenriquezc https://github.com/carloshenriquezc Yes. Good question! Something like this.
library(microeco) data(dataset) t1 <- trans_func$new(dataset) t1$cal_spe_func() t1$cal_spe_func_perc() t1$cal_spe_func_perc(abundance_weighted = TRUE)
t1$res_spe_func_perc is the result we require
first create microtable object, which is necessary
m1 <- microtable$new(as.data.frame(t(t1$res_spe_func_perc)), sample_table = dataset$sample_table)
because we donnot have taxonomy table, we directly generate an abundance table in taxa_abund list, which is necessary for the input of trans_diff
m1$taxa_abund$OTU <- m1$otu_table
use temporal OTU as taxa_level to do trans_diff
t1 <- trans_diff$new(dataset = m1, method = "wilcox", group = "Group", taxa_level = "OTU") t1$plot_diff_abund(use_number = 1:20, add_sig = T)
other methods also available , please see the tutorial
Sure. Using trans_env is also ok and easier as there has been the similar differential tests in trans_env class.
Best, Chi
— Reply to this email directly, view it on GitHub https://github.com/ChiLiubio/microeco/issues/111#issuecomment-1314665960, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKTMCQRUJ6NFO32URNVNRADWILW3DANCNFSM5W54AZFA . You are receiving this because you were mentioned.Message ID: @.***>
Hi @carloshenriquezc
Thanks. I donot see any attachment. The reason may be that when you reply the email directly, the attachment can not be shown. Could you please write and paste sth in github here? I donnot get your point without the data. So please write your scripts and paste your data (https://chiliubio.github.io/microeco_tutorial/notes.html#save-function) that I can reproduce.
Chi
Thanks, I will post in the github.
Attached the plots. Carlos Henríquez C. PhD. Investigador Titular Centro de Estudios Avanzados en Zonas Áridas (CEAZA) Laboratorio de Fisiología y Genética Marina (FIGEMA)
Fono: (+56 51) 2673262
Larrondo 1281, Coquimbo Chile
On Sat, May 6, 2023 at 3:59 AM Chi Liu @.***> wrote:
Hi @carloshenriquezc https://github.com/carloshenriquezc
Thanks. I donot see any attachment. The reason may be that when you reply the email directly, the attachment can not be shown. Could you please write and paste sth in github here? I donnot get your point without the data. So please write your scripts and paste your data ( https://chiliubio.github.io/microeco_tutorial/notes.html#save-function) that I can reproduce.
Chi
— Reply to this email directly, view it on GitHub https://github.com/ChiLiubio/microeco/issues/111#issuecomment-1537083604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKTMCQWT7FNDKB3ZKVRITO3XEYAH3ANCNFSM5W54AZFA . You are receiving this because you were mentioned.Message ID: @.***>
Hello,
I have a quick question about tax4fun2. Is it possible to generate a file that has the KO annotations for each individual ASV/OTU in my dataset? Ideally I would like to be able to do things like subset all of my ASVs that are predicted to be capable of nitrogen cycling and examine how the N-cycling community and their metabolic potential change in each sample over time. I think this would be a very powerful and useful feature for microeco to have.
I was able to do this using the output for FAPROTAX, however I ran into a big issue with the program. The database may not have been updated recently, so I get drastically different metabolic results if I generate my taxonomy with the current Silva v138.1 database vs a Silva database from 2020. For example, in the 2020 version I have several ammonia oxidizing bacteria that get annotated, and in the more recent 138.1 version they don't show up- it appears there are only ammonia oxidizing archaea in my samples. Just thought I should point this out in case you'd like to add a disclaimer.
Thank you so much for your help and all of your hard work creating this fantastic R package! I have been recommending microeco to everyone.
Jake Callaghan
Hello Jake!
I'm looking for an answer to the same question, specifically how the N-cycling community and their metabolic potential change over time in each sample. Have you figured out this with Tax4Fun2?
Thank you in advance! Alex
Hi Alex,
It has been a while but here is what I remember- there is a folder that gets made when using tax4fun2 through micreco, and in it is a file called functional_prediction.txt with KO numbers for each sequence. For what you want to look at you should change all the values > 0 to 1s (Chi explains what the fractions mean further up the thread), and this will be your gene presence-absence matrix. Now all you need is a list of KO numbers you are interested in for N cycling and you can subset or filter them out from there. The graphing exploration I did from this point on was all using ggplot and not microeco (though there are additional features now that might be useful).
As an alternative to tax4fun2, you could try PICRUST2, which is supposed to be more accurate and have a higher number of predictions. This can be set up to give you KO, EC number, and metacyc pathway annotations which can be used in a similar way to what I described above.
I hope this helps! Let me know if you have any more questions.
Jake Callaghan
Hi Chi,
I was trying to to convert the output pathway files of PICRUSt2 to microtable object.
I receive this message: Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input
When running: tmp_file_path <- system.file("extdata", "path_abun_unstrat.tsv", package="file2meco") pathway_table <- read.delim(tmp_file_path, row.names = 1)
Find attached the table.
Hope you can help me with this!
Carlos Henríquez C. PhD. Investigador Titular Centro de Estudios Avanzados en Zonas Áridas (CEAZA) Laboratorio de Fisiología y Genética Marina (FIGEMA)
Fono: (+56 51) 2673262
Larrondo 1281, Coquimbo Chile
On Sat, May 6, 2023 at 3:59 AM Chi Liu @.***> wrote:
Hi @carloshenriquezc https://github.com/carloshenriquezc
Thanks. I donot see any attachment. The reason may be that when you reply the email directly, the attachment can not be shown. Could you please write and paste sth in github here? I donnot get your point without the data. So please write your scripts and paste your data ( https://chiliubio.github.io/microeco_tutorial/notes.html#save-function) that I can reproduce.
Chi
— Reply to this email directly, view it on GitHub https://github.com/ChiLiubio/microeco/issues/111#issuecomment-1537083604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKTMCQWT7FNDKB3ZKVRITO3XEYAH3ANCNFSM5W54AZFA . You are receiving this because you were mentioned.Message ID: @.***>
Hi Carlos, Please find this part in the tutorial (https://chiliubio.github.io/microeco_tutorial/file2meco-package.html#picrust2).
Chi