microeco icon indicating copy to clipboard operation
microeco copied to clipboard

Tax4Fun2 ASV annotations and FAPROTAX issues

Open calla404 opened this issue 2 years ago • 15 comments

Hello,

I have a quick question about tax4fun2. Is it possible to generate a file that has the KO annotations for each individual ASV/OTU in my dataset? Ideally I would like to be able to do things like subset all of my ASVs that are predicted to be capable of nitrogen cycling and examine how the N-cycling community and their metabolic potential change in each sample over time. I think this would be a very powerful and useful feature for microeco to have.

I was able to do this using the output for FAPROTAX, however I ran into a big issue with the program. The database may not have been updated recently, so I get drastically different metabolic results if I generate my taxonomy with the current Silva v138.1 database vs a Silva database from 2020. For example, in the 2020 version I have several ammonia oxidizing bacteria that get annotated, and in the more recent 138.1 version they don't show up- it appears there are only ammonia oxidizing archaea in my samples. Just thought I should point this out in case you'd like to add a disclaimer.

Thank you so much for your help and all of your hard work creating this fantastic R package! I have been recommending microeco to everyone.

Jake Callaghan

calla404 avatar May 25 '22 17:05 calla404

Hi Jake @calla404,

Very good idea! I have also thought of this point recently. Which version of microeco are you using? I have added this output file in version 0.9.0. The file name is "res_tax4fun2_reference_profile.tsv" in your temporary directory assigned in the parameter path_to_temp_folder. Please try to check it. If path_to_temp_folder is NULL, please find the temporary in system according to the messages. Thanks for letting me know this difference. The updating of FAPROTAX is very slow because it is a hand-curated database depending on the past papers and books. So the taxonomic names in it have a very low probability to be updated. The taxonomic names is a big question in almost all biology field. I remember there is also an issue about the Funguild referring to the difference of taxa names between UNITE and NCBI. Now, back to the question. Could you paste several examples of ammonia oxidizing bacteria that not annotated and shown up. I will check it manually and think about how to make it better solved. Thanks for your finding and good suggestion.

Best, Chi

ChiLiubio avatar May 26 '22 01:05 ChiLiubio

Hi Chi,

Thanks so much for your prompt response! I was previously using microeco v0.8.0. After updating to v0.9.0 I re-ran tax4fun2 and the tsv file was successfully generated. I was expecting a presence/absence matrix of 1's and 0's like from FAPROTAX, but my output file has a lot of decimal places. For example, 0.3333, 0.142857143, and 0.5 are some of the values I see. What do these numbers correspond to?

As far as the ammonia oxidizing bacteria go, it looks like it is several members of the Nitrosomonadaceae that get flagged for aerobic ammonia oxidation using the taxonomy sheet from 2020 but not using the current 138.1 database. However that's very odd because it doesn't look like that name has changed in the latest silva release. I'll have to go back through my script and make sure I don't have an error in my code that could have lead to this.

Thanks again, Jake

calla404 avatar May 26 '22 19:05 calla404

Hi Jake @calla404,

The decimal in that file comes from two reasons. One is the KO annoation result for a protein seq may be multiple, i.e. one sequence may have two or more KO. Another is the normalization by 16S copy number. So those numbers are the results after the adjustment. Note that different KO annotation tools could generate distinct results. So if you try to change use_uproc = F in the cal_tax4fun2 function, the results can also change. If you only intrested in the 0/1 data, you can directly change the values > 0 to 1. Ok. The annotation tools, params and databases can all generate differences on the results. Please feel free to tell me if the question still occur after you check the annotation step.

Chi

ChiLiubio avatar May 27 '22 03:05 ChiLiubio

Hi Chi, Thanks for microeco.

It is possible to perform the trans_diff command to the output of the FAPROTAX functional annotation of samples, as described for tax4fun2?

Thanks in advance.

carloshenriquezc avatar Nov 14 '22 15:11 carloshenriquezc

Hi @carloshenriquezc Yes. Good question! Something like this.

library(microeco)
data(dataset)
t1 <- trans_func$new(dataset)
t1$cal_spe_func()
t1$cal_spe_func_perc()
t1$cal_spe_func_perc(abundance_weighted = TRUE)
# t1$res_spe_func_perc is the result we require
# first create microtable object, which is necessary
m1 <- microtable$new(as.data.frame(t(t1$res_spe_func_perc)), sample_table = dataset$sample_table)
# because we donnot have taxonomy table, we directly generate an abundance table in taxa_abund list, which is necessary for the input of trans_diff 
m1$taxa_abund$OTU <- m1$otu_table
# use temporal OTU as taxa_level to do trans_diff
t1 <- trans_diff$new(dataset = m1, method = "wilcox", group = "Group", taxa_level = "OTU")
t1$plot_diff_abund(use_number = 1:20, add_sig = T)
# other methods also available , please see the tutorial

Sure. Using trans_env is also ok and easier as there has been the similar differential tests in trans_env class.

Best, Chi

ChiLiubio avatar Nov 15 '22 02:11 ChiLiubio

Hi Chi,

Thanks for your support.

I have another doubt: There is a way to show the functional plot by group of samples.

I tried using this:

dataset$merge_samples(use_group = "S_T") t2 <- trans_func$new(dataset) t2$cal_spe_func(prok_database = "FAPROTAX") t2$cal_spe_func_perc(abundance_weighted = FALSE) t2$plot_spe_func_perc()

But the graph shows by sample functional traits.

Thanks in advance!

Carlos Henríquez C. PhD. Investigador Titular Centro de Estudios Avanzados en Zonas Áridas (CEAZA) Laboratorio de Fisiología y Genética Marina (FIGEMA)

Fono: (+56 51) 2673262

Larrondo 1281, Coquimbo Chile

On Mon, Nov 14, 2022 at 11:15 PM Chi Liu @.***> wrote:

Hi @carloshenriquezc https://github.com/carloshenriquezc Yes. Good question! Something like this.

library(microeco) data(dataset) t1 <- trans_func$new(dataset) t1$cal_spe_func() t1$cal_spe_func_perc() t1$cal_spe_func_perc(abundance_weighted = TRUE)

t1$res_spe_func_perc is the result we require

first create microtable object, which is necessary

m1 <- microtable$new(as.data.frame(t(t1$res_spe_func_perc)), sample_table = dataset$sample_table)

because we donnot have taxonomy table, we directly generate an abundance table in taxa_abund list, which is necessary for the input of trans_diff

m1$taxa_abund$OTU <- m1$otu_table

use temporal OTU as taxa_level to do trans_diff

t1 <- trans_diff$new(dataset = m1, method = "wilcox", group = "Group", taxa_level = "OTU") t1$plot_diff_abund(use_number = 1:20, add_sig = T)

other methods also available , please see the tutorial

Sure. Using trans_env is also ok and easier as there has been the similar differential tests in trans_env class.

Best, Chi

— Reply to this email directly, view it on GitHub https://github.com/ChiLiubio/microeco/issues/111#issuecomment-1314665960, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKTMCQRUJ6NFO32URNVNRADWILW3DANCNFSM5W54AZFA . You are receiving this because you were mentioned.Message ID: @.***>

carloshenriquezc avatar Mar 16 '23 12:03 carloshenriquezc

Hi. The first line generates a new data that should be assigned with a name. Otherwise, it only be printed

d1 <- dataset$merge_samples(use_group = "S_T")
t2 <- trans_func$new(d1)
t2$cal_spe_func(prok_database = "FAPROTAX")
t2$cal_spe_func_perc(abundance_weighted = FALSE)
t2$plot_spe_func_perc()

Chi

ChiLiubio avatar Mar 16 '23 14:03 ChiLiubio

awesome

Thanks Chi!

Carlos Henríquez C. PhD. Investigador Titular Centro de Estudios Avanzados en Zonas Áridas (CEAZA) Laboratorio de Fisiología y Genética Marina (FIGEMA)

Fono: (+56 51) 2673262

Larrondo 1281, Coquimbo Chile

On Thu, Mar 16, 2023 at 11:35 AM Chi Liu @.***> wrote:

Hi. The first line generates a new data that should be assigned with a name. Otherwise, it only be printed

d1 <- dataset$merge_samples(use_group = "S_T") t2 <- trans_func$new(d1) t2$cal_spe_func(prok_database = "FAPROTAX") t2$cal_spe_func_perc(abundance_weighted = FALSE) t2$plot_spe_func_perc()

Chi

— Reply to this email directly, view it on GitHub https://github.com/ChiLiubio/microeco/issues/111#issuecomment-1472100566, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKTMCQW5P3JVHYWKLH4G7ZLW4MQMPANCNFSM5W54AZFA . You are receiving this because you were mentioned.Message ID: @.***>

carloshenriquezc avatar Mar 16 '23 15:03 carloshenriquezc

Hi Chi, Microeco is getting more and more complete.

I have noticed that when I do differential abundance using FungalTraits two overlapping plots are generated. This does not happen if I use Funguild.

I attach both.

Another question. How can I specify the order of the samples without doing merge_samples

thank you very much again

Carlos Henríquez C. PhD. Investigador Titular Centro de Estudios Avanzados en Zonas Áridas (CEAZA) Laboratorio de Fisiología y Genética Marina (FIGEMA)

Fono: (+56 51) 2673262

Larrondo 1281, Coquimbo Chile

On Mon, Nov 14, 2022 at 11:15 PM Chi Liu @.***> wrote:

Hi @carloshenriquezc https://github.com/carloshenriquezc Yes. Good question! Something like this.

library(microeco) data(dataset) t1 <- trans_func$new(dataset) t1$cal_spe_func() t1$cal_spe_func_perc() t1$cal_spe_func_perc(abundance_weighted = TRUE)

t1$res_spe_func_perc is the result we require

first create microtable object, which is necessary

m1 <- microtable$new(as.data.frame(t(t1$res_spe_func_perc)), sample_table = dataset$sample_table)

because we donnot have taxonomy table, we directly generate an abundance table in taxa_abund list, which is necessary for the input of trans_diff

m1$taxa_abund$OTU <- m1$otu_table

use temporal OTU as taxa_level to do trans_diff

t1 <- trans_diff$new(dataset = m1, method = "wilcox", group = "Group", taxa_level = "OTU") t1$plot_diff_abund(use_number = 1:20, add_sig = T)

other methods also available , please see the tutorial

Sure. Using trans_env is also ok and easier as there has been the similar differential tests in trans_env class.

Best, Chi

— Reply to this email directly, view it on GitHub https://github.com/ChiLiubio/microeco/issues/111#issuecomment-1314665960, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKTMCQRUJ6NFO32URNVNRADWILW3DANCNFSM5W54AZFA . You are receiving this because you were mentioned.Message ID: @.***>

carloshenriquezc avatar May 06 '23 01:05 carloshenriquezc

Hi @carloshenriquezc

Thanks. I donot see any attachment. The reason may be that when you reply the email directly, the attachment can not be shown. Could you please write and paste sth in github here? I donnot get your point without the data. So please write your scripts and paste your data (https://chiliubio.github.io/microeco_tutorial/notes.html#save-function) that I can reproduce.

Chi

ChiLiubio avatar May 06 '23 07:05 ChiLiubio

Thanks, I will post in the github.

Attached the plots. Carlos Henríquez C. PhD. Investigador Titular Centro de Estudios Avanzados en Zonas Áridas (CEAZA) Laboratorio de Fisiología y Genética Marina (FIGEMA)

Fono: (+56 51) 2673262

Larrondo 1281, Coquimbo Chile

On Sat, May 6, 2023 at 3:59 AM Chi Liu @.***> wrote:

Hi @carloshenriquezc https://github.com/carloshenriquezc

Thanks. I donot see any attachment. The reason may be that when you reply the email directly, the attachment can not be shown. Could you please write and paste sth in github here? I donnot get your point without the data. So please write your scripts and paste your data ( https://chiliubio.github.io/microeco_tutorial/notes.html#save-function) that I can reproduce.

Chi

— Reply to this email directly, view it on GitHub https://github.com/ChiLiubio/microeco/issues/111#issuecomment-1537083604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKTMCQWT7FNDKB3ZKVRITO3XEYAH3ANCNFSM5W54AZFA . You are receiving this because you were mentioned.Message ID: @.***>

carloshenriquezc avatar May 06 '23 10:05 carloshenriquezc

Hello,

I have a quick question about tax4fun2. Is it possible to generate a file that has the KO annotations for each individual ASV/OTU in my dataset? Ideally I would like to be able to do things like subset all of my ASVs that are predicted to be capable of nitrogen cycling and examine how the N-cycling community and their metabolic potential change in each sample over time. I think this would be a very powerful and useful feature for microeco to have.

I was able to do this using the output for FAPROTAX, however I ran into a big issue with the program. The database may not have been updated recently, so I get drastically different metabolic results if I generate my taxonomy with the current Silva v138.1 database vs a Silva database from 2020. For example, in the 2020 version I have several ammonia oxidizing bacteria that get annotated, and in the more recent 138.1 version they don't show up- it appears there are only ammonia oxidizing archaea in my samples. Just thought I should point this out in case you'd like to add a disclaimer.

Thank you so much for your help and all of your hard work creating this fantastic R package! I have been recommending microeco to everyone.

Jake Callaghan

Hello Jake!

I'm looking for an answer to the same question, specifically how the N-cycling community and their metabolic potential change over time in each sample. Have you figured out this with Tax4Fun2?

Thank you in advance! Alex

ghost avatar May 22 '23 03:05 ghost

Hi Alex,

It has been a while but here is what I remember- there is a folder that gets made when using tax4fun2 through micreco, and in it is a file called functional_prediction.txt with KO numbers for each sequence. For what you want to look at you should change all the values > 0 to 1s (Chi explains what the fractions mean further up the thread), and this will be your gene presence-absence matrix. Now all you need is a list of KO numbers you are interested in for N cycling and you can subset or filter them out from there. The graphing exploration I did from this point on was all using ggplot and not microeco (though there are additional features now that might be useful).

As an alternative to tax4fun2, you could try PICRUST2, which is supposed to be more accurate and have a higher number of predictions. This can be set up to give you KO, EC number, and metacyc pathway annotations which can be used in a similar way to what I described above.

I hope this helps! Let me know if you have any more questions.

Jake Callaghan

calla404 avatar May 22 '23 16:05 calla404

Hi Chi,

I was trying to to convert the output pathway files of PICRUSt2 to microtable object.

I receive this message: Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input

When running: tmp_file_path <- system.file("extdata", "path_abun_unstrat.tsv", package="file2meco") pathway_table <- read.delim(tmp_file_path, row.names = 1)

Find attached the table.

Hope you can help me with this!

Carlos Henríquez C. PhD. Investigador Titular Centro de Estudios Avanzados en Zonas Áridas (CEAZA) Laboratorio de Fisiología y Genética Marina (FIGEMA)

Fono: (+56 51) 2673262

Larrondo 1281, Coquimbo Chile

On Sat, May 6, 2023 at 3:59 AM Chi Liu @.***> wrote:

Hi @carloshenriquezc https://github.com/carloshenriquezc

Thanks. I donot see any attachment. The reason may be that when you reply the email directly, the attachment can not be shown. Could you please write and paste sth in github here? I donnot get your point without the data. So please write your scripts and paste your data ( https://chiliubio.github.io/microeco_tutorial/notes.html#save-function) that I can reproduce.

Chi

— Reply to this email directly, view it on GitHub https://github.com/ChiLiubio/microeco/issues/111#issuecomment-1537083604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKTMCQWT7FNDKB3ZKVRITO3XEYAH3ANCNFSM5W54AZFA . You are receiving this because you were mentioned.Message ID: @.***>

carloshenriquezc avatar May 27 '23 11:05 carloshenriquezc

Hi Carlos, Please find this part in the tutorial (https://chiliubio.github.io/microeco_tutorial/file2meco-package.html#picrust2).

Chi

ChiLiubio avatar May 28 '23 01:05 ChiLiubio