atlas icon indicating copy to clipboard operation
atlas copied to clipboard

Functional/orthology analysis

Open botellaflotante opened this issue 4 years ago • 23 comments

if I wanted to analyze KO / KEGG pathways or COGs between MAGs from different environments, what would you recommend to do with the eggNog annotations from ATLAS? as there are many KO numbers/COGs, which one should I use? do you reccomend any R package for this?

botellaflotante avatar Nov 13 '20 20:11 botellaflotante

I suggest you first to run this extension on atlas. https://github.com/metagenome-atlas/atlas_analyze

It produces:sparkles:such cool report:sparkles:

Which among others calculate the KO per sample based on the MAGs present.

There is python code to reproduce the report or if you prefer R. Here is the R version Setup can be done via the conda environment.

Have also a look for the interactive version on a test dataset.

If with your MAgs you have only a little coverage you might also want to quantify the genes directly.

SilasK avatar Nov 15 '20 21:11 SilasK

Nice! Is it possible to run it if I have multiple samples but run separately in ATLAS?

botellaflotante avatar Nov 16 '20 09:11 botellaflotante

The short answer is no.

I suggest, you to combine the different projects. This allows a consistent quantification of all samples on the same MAGs/genecatalog. Merging shouldn't be complicated: It's just coping the sample folders into a new folder with a samples.tsv describing all samples. Copy also the reports/stats/log folder in the new working directory. In this way, the assembly and binning for each sample is kept and only the steps of genecatalog and genome are rerun.

If you don't want to combine the atlas projects: You can run atlas_analyze on all of them. You should get a relative abundance of genomes and functions for each of them. Then you can combine the tables (This you have to do yourselves)

SilasK avatar Nov 16 '20 09:11 SilasK

I haven't yet statistical test for comparing functions only genome abundances. But I would simply take a non-parametrical test e.g. mann-whitney and then do a correction for multiple testing. PCA of log2(psedocaount+relative abundance ) could also be an idea. If you have other Ideas how to compare the functions between different samples I'm eager to hear.

SilasK avatar Nov 16 '20 09:11 SilasK

ok! to combine them, in the case of reports/stats/log folders, should I copy only one of them (any) in the new working dir?

botellaflotante avatar Nov 16 '20 12:11 botellaflotante

Yes, copy any of them.

SilasK avatar Nov 16 '20 13:11 SilasK

Hi Silas! When running analyze.py on these sampls I got this error, has it happen to you before?

output: Results/Summary.html shell: jupyter nbconvert --output Summary --TemplateExporter.exclude_input=True Results/Code.ipynb (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

it gives me these files/folders in Results, tough, but I think final analyzes are missing:

annotations counts mapping_rate.tsv Code.ipynb genome_completeness.tsv taxonomy.tsv

Another silly question: when I run multiple samples this analyzes will compare MAGs which are common to all samples (all samples have mapped reads for these), but those which are specific to a sample (or not common to all) will be lost, right?

Many thanks!!

botellaflotante avatar Nov 17 '20 00:11 botellaflotante

No, sorry this is not thorough fully tested. Are the no other stout? Did you set up the atlas_analyze environment as described in the Readme of atlas_analyze?

What happens when you run 'jupyter nbconvert’ Maybe you need to install something more?

SilasK avatar Nov 23 '20 12:11 SilasK

I think this should solve your problem https://github.com/metagenome-atlas/atlas_analyze/issues/1

SilasK avatar Nov 26 '20 08:11 SilasK

Thanks! I already used the scripts without the nbconvert, very usefull!

I suggest, you to combine the different projects. This allows a consistent quantification of all samples on the same MAGs/genecatalog. Merging shouldn't be complicated: It's just coping the sample folders into a new folder with a samples.tsv describing all samples. Copy also the reports/stats/log folder in the new working directory. In this way, the assembly and binning for each sample is kept and only the steps of genecatalog and genome are rerun.

When I copy the samples directories to a new one and add them the reports/stats/log folder together with config and samples files I get this error when trying to do the genomes step:

Building DAG of jobs... CyclicGraphException in line 338 of /home/miniconda3/envs/atlasenv/lib/python3.6/site-packages/atlas/rules/genomes.smk: Cyclic dependency on rule bam_2_sam_MAGs.

this is what I got in he dir config.yaml reports SRR6797243 SRR6797246 logs samples.tsv SRR6797244 stats

I changed the samples.tsv to these 3 samples ...

best

botellaflotante avatar Nov 30 '20 19:11 botellaflotante

I know this an issue, I should correct.

What happens if you set 'atlas run genecatalog' or 'atlas run genomes' ?

SilasK avatar Dec 01 '20 17:12 SilasK

Well this happens with any option (genomes, genecatalog, or all). I tried copying some bam files to genomes/alignment but did not work... I looked into the smk genome file but I don`t find where the problem would be (input asks for this bam file.. but..) best

botellaflotante avatar Dec 07 '20 00:12 botellaflotante

I think this may be useful to add some more samples to a project, or also to compare different samples which were run independently before, without having to run everything from 0, right?

botellaflotante avatar Dec 07 '20 00:12 botellaflotante

I haven't yet statistical test for comparing functions only genome abundances. But I would simply take a non-parametrical test e.g. mann-whitney and then do a correction for multiple testing. PCA of log2(psedocaount+relative abundance ) could also be an idea. If you have other Ideas how to compare the functions between different samples I'm eager to hear.

So, would it be possible to use ALDEx2 or something similar to do differential abundance of MAGs clr raw counts between samples? or would you use relative abundances for that?

botellaflotante avatar Dec 15 '20 16:12 botellaflotante

First of all sorry for the error about the cyclic dependency.

Cyclic dependency on rule bam_2_sam_MAGs.

Is this still a blocking issue, factor for you? My plan would be to drop the storage of bam, which removes the cyclic dependency.

By any chance @jmtsuji did you encounter the cyclic dependency?

For your question about statistics: HAve a look at the brand new Tutorial, where I show how to use Aldex for the genome abundance. https://github.com/metagenome-atlas/Tutorial

The abundance of function is based on the relative abundance of genomes, therefore it's better to use non-parametrical test instead of aldex.

Kind regards Silas

SilasK avatar Dec 15 '20 19:12 SilasK

First of all sorry for the error about the cyclic dependency.

Cyclic dependency on rule bam_2_sam_MAGs.

Is this still a blocking issue, factor for you? My plan would be to drop the storage of bam, which removes the cyclic dependency.

By any chance @jmtsuji did you encounter the cyclic dependency?

For your question about statistics: HAve a look at the brand new Tutorial, where I show how to use Aldex for the genome abundance. https://github.com/metagenome-atlas/Tutorial

The abundance of function is based on the relative abundance of genomes, therefore it's better to use non-parametrical test instead of aldex.

Kind regards Silas

Thanks Silas. Well, I am running again the samples from 0. But I think it could be good to have the chance to include new samples to a project that has already been run. If you lead me I can try do it...

I will check the tutorial, thanks. I guess that is what I was looking for, some way to assess if differences in MAGs abundance-compositions are significant or not... do you think this is aplicable to MAGs with a minimum completeness value, or for MAGs with 20% compl, would do it also?

Best

botellaflotante avatar Dec 15 '20 21:12 botellaflotante

@SilasK Interesting -- I've never run into that cyclic dependency error before. (Also, I regularly look through the BAM files output by ATLAS to link unassembled read analyses I do outside of ATLAS to genome bins produced by ATLAS, so I at least appreciate having access to them. :-) )

jmtsuji avatar Dec 16 '20 05:12 jmtsuji

Do you know why is it better to use clr than count normalization from the DESeq package to analyze MAGs abundances across samples? this last should correct for seq depth and also differences in compositions from highly abundant MAGs changes, right? Thanks

botellaflotante avatar Jan 05 '21 20:01 botellaflotante

I think I will point you to this article http://journal.frontiersin.org/article/10.3389/fmicb.2017.02224/full

At the end CLR and the DEseq normalisation are not completely different. It’s just that the CLR takes explicitly the compositional nature into account.

SilasK avatar Jan 06 '21 08:01 SilasK

ok thanks, and, would it be also possible to do a compositional approach for KO and CAZy analyses?

botellaflotante avatar Jan 10 '21 23:01 botellaflotante

I don’t know how to use a compositional approach for functions. If you have an idea I’m happy to discuss. The approach implemented in atlas, e.g. in the Tutorial. Sums the relative abundance of species that have a certain function. And the problem sums are no longer possible if you use log ratios.

An other approach would be to look for presence/absence but this is also not CoDA.

SilasK avatar Jan 11 '21 07:01 SilasK

yes, I don 't know if that's possible either. Isnt it possible to map and group raw reads associated to each KO to do CoDa or something like this? Do you know how to map these K numbers to level2/level3 functional categories in KEGG? is there any standard/easy way of doing this mapping to have like a table with gene counts and functional categories? I think one way would be to download this json (https://www.genome.jp/kegg-bin/show_brite?ko00001) and then map K number to them... but maybe there{s an easier way Thanks!

botellaflotante avatar Jan 18 '21 13:01 botellaflotante

I'm experimenting with DRAM #360

SilasK avatar Jan 18 '21 14:01 SilasK

There was no activity since some time. I hope your issue is solved in the mean time. This issue will automatically close soon if no further activity occurs.

Thank you for your contributions.

github-actions[bot] avatar Apr 04 '23 13:04 github-actions[bot]