eggnog-mapper icon indicating copy to clipboard operation
eggnog-mapper copied to clipboard

too many GO annotations

Open biomichal opened this issue 3 years ago • 6 comments

Hi. I got puzzled when using do GO functional annotation to do enrichment. In many cases, one gene got a lot of GO ids in the results, more than the other methods, such as blast2go, Interproscan, et. al., so how should I use these Go ID, and why there are so many GO annotations. Thanks.

One example: gene1044 GO:0002376,GO:0003674,GO:0003824,GO:0004587,GO:0005488,GO:0005515,GO:0005575,GO:0005622,GO:0005623,GO:0005737,GO:0005739,GO:0005759,GO:0006082,GO:0006520,GO:0006525,GO:0006527,GO:0006536,GO:0006560,GO:0006561,GO:0006591,GO:0006593,GO:0006807,GO:0006950,GO:0006952,GO:0006955,GO:0006970,GO:0006972,GO:0008144,GO:0008150,GO:0008152,GO:0008219,GO:0008270,GO:0008483,GO:0008652,GO:0009056,GO:0009058,GO:0009063,GO:0009064,GO:0009065,GO:0009084,GO:0009605,GO:0009607,GO:0009617,GO:0009626,GO:0009628,GO:0009651,GO:0009814,GO:0009816,GO:0009987,GO:0012501,GO:0016053,GO:0016054,GO:0016740,GO:0016769,GO:0018130,GO:0019544,GO:0019752,GO:0019842,GO:0030170,GO:0031974,GO:0033554,GO:0034050,GO:0036094,GO:0042538,GO:0042742,GO:0042802,GO:0043167,GO:0043168,GO:0043169,GO:0043207,GO:0043226,GO:0043227,GO:0043229,GO:0043231,GO:0043233,GO:0043436,GO:0043648,GO:0044237,GO:0044238,GO:0044248,GO:0044249,GO:0044281,GO:0044282,GO:0044283,GO:0044422,GO:0044424,GO:0044429,GO:0044444,GO:0044446,GO:0044464,GO:0045087,GO:0046394,GO:0046395,GO:0046483,GO:0046872,GO:0046914,GO:0048037,GO:0050662,GO:0050896,GO:0051179,GO:0051640,GO:0051641,GO:0051646,GO:0051704,GO:0051707,GO:0051716,GO:0070013,GO:0070279,GO:0071704,GO:0097159,GO:0098542,GO:1901360,GO:1901362,GO:1901363,GO:1901564,GO:1901565,GO:1901566,GO:1901575,GO:1901576,GO:1901605,GO:1901606,GO:1901607

biomichal avatar Sep 10 '22 02:09 biomichal

Hi @biomichal ,

I am not an expert on GO, but it is likely that you got so many GO terms because they come from different proteins from the orthologous group. Some of them will be very general, others will be much more specific, but also probably less abundant in the orthologous group.

Maybe you could try to summarize them by translating them GO Slim (which we hope to include in future versions). I recently used the GSEA tools in R to do it. You could also use tools like topGO. I am not sure whether there are more recent tools to cope with this. Also, you could create plots, like those obtained with GO-Figure! Or you could parse the GO graph with the .obo or .xml files from GO website yourself. I guess it also depends on your actual project and goals.

Just my 2 cents.

Best, Carlos

Cantalapiedra avatar Sep 10 '22 07:09 Cantalapiedra

Hi @Cantalapiedra

Many thanks for your suggestions, I will try later.

If they are from different proteins of ortho-group, it will be normal to generate such a list of GOs. But it is better to only display the best hit.

Besides, there are no explicit classification for the three types of GO: BiologicalProcess, CellularComponent, MolecularFunction in current version. Hope to see these in future.

I also noticed the parameters: --go_evidence experimental|non-electronic|all It defines what type of GO terms should be used for annotation. experimental = Use only terms inferred from experimental evidence. non-electronic (default) = Use only non-electronically curated terms. all = all GO terms will be retrieved.

how about I use experimental? I am now trying.

Best

biomichal avatar Sep 10 '22 08:09 biomichal

Hi @biomichal ,

Ah yes, of course. I forgot that you can narrow the results also using --go_evidence. You could also play with other parameters, like --tax_scope, --tax_scope_mode, --target_orthologs and --target_taxa.

I am not sure whether it is really better to display only best hit. There will cases where only retrieving annotation terms from all orthologs you will have enough sensitivity. It is something that (I guess) can vary from an orth group to another.

Regarding the 3 ontologies, it is something you will obtain as soon as you retrieve info from the GO terms. However, it is a valid suggestion that we may consider for future releases. Thank you.

Best, Carlos

Cantalapiedra avatar Sep 10 '22 09:09 Cantalapiedra

Also see #389 , I suppose. I.e. I don't think this is an issue with too many annotation sources, but that Eggnog Mapper is printing the entire GO graph (or at the very least redundant levels of information); instead of only the tips of the graph.

marchoeppner avatar Jan 03 '23 14:01 marchoeppner

Hi @biomichal , can I ask what did you decided to use in the end? I also want to do an enrichment analysis and I am not sure whether I have to use all GOs given for one gene. I also have a situation the I got a lot of GO terms per single transcript. Tnx!

LadaJov avatar Aug 21 '23 13:08 LadaJov

Hi all,

im also struggling with the GO therms, did someone found a way to plot a CC, BP and MF charts at specific level from the output?

vebaev avatar Jan 15 '24 18:01 vebaev