eggnog-mapper
eggnog-mapper copied to clipboard
too many GO annotations
Hi. I got puzzled when using do GO functional annotation to do enrichment. In many cases, one gene got a lot of GO ids in the results, more than the other methods, such as blast2go, Interproscan, et. al., so how should I use these Go ID, and why there are so many GO annotations. Thanks.
One example: gene1044 GO:0002376,GO:0003674,GO:0003824,GO:0004587,GO:0005488,GO:0005515,GO:0005575,GO:0005622,GO:0005623,GO:0005737,GO:0005739,GO:0005759,GO:0006082,GO:0006520,GO:0006525,GO:0006527,GO:0006536,GO:0006560,GO:0006561,GO:0006591,GO:0006593,GO:0006807,GO:0006950,GO:0006952,GO:0006955,GO:0006970,GO:0006972,GO:0008144,GO:0008150,GO:0008152,GO:0008219,GO:0008270,GO:0008483,GO:0008652,GO:0009056,GO:0009058,GO:0009063,GO:0009064,GO:0009065,GO:0009084,GO:0009605,GO:0009607,GO:0009617,GO:0009626,GO:0009628,GO:0009651,GO:0009814,GO:0009816,GO:0009987,GO:0012501,GO:0016053,GO:0016054,GO:0016740,GO:0016769,GO:0018130,GO:0019544,GO:0019752,GO:0019842,GO:0030170,GO:0031974,GO:0033554,GO:0034050,GO:0036094,GO:0042538,GO:0042742,GO:0042802,GO:0043167,GO:0043168,GO:0043169,GO:0043207,GO:0043226,GO:0043227,GO:0043229,GO:0043231,GO:0043233,GO:0043436,GO:0043648,GO:0044237,GO:0044238,GO:0044248,GO:0044249,GO:0044281,GO:0044282,GO:0044283,GO:0044422,GO:0044424,GO:0044429,GO:0044444,GO:0044446,GO:0044464,GO:0045087,GO:0046394,GO:0046395,GO:0046483,GO:0046872,GO:0046914,GO:0048037,GO:0050662,GO:0050896,GO:0051179,GO:0051640,GO:0051641,GO:0051646,GO:0051704,GO:0051707,GO:0051716,GO:0070013,GO:0070279,GO:0071704,GO:0097159,GO:0098542,GO:1901360,GO:1901362,GO:1901363,GO:1901564,GO:1901565,GO:1901566,GO:1901575,GO:1901576,GO:1901605,GO:1901606,GO:1901607
Hi @biomichal ,
I am not an expert on GO, but it is likely that you got so many GO terms because they come from different proteins from the orthologous group. Some of them will be very general, others will be much more specific, but also probably less abundant in the orthologous group.
Maybe you could try to summarize them by translating them GO Slim (which we hope to include in future versions). I recently used the GSEA tools in R to do it. You could also use tools like topGO. I am not sure whether there are more recent tools to cope with this. Also, you could create plots, like those obtained with GO-Figure! Or you could parse the GO graph with the .obo or .xml files from GO website yourself. I guess it also depends on your actual project and goals.
Just my 2 cents.
Best, Carlos
Hi @Cantalapiedra
Many thanks for your suggestions, I will try later.
If they are from different proteins of ortho-group, it will be normal to generate such a list of GOs. But it is better to only display the best hit.
Besides, there are no explicit classification for the three types of GO: BiologicalProcess, CellularComponent, MolecularFunction in current version. Hope to see these in future.
I also noticed the parameters: --go_evidence experimental|non-electronic|all It defines what type of GO terms should be used for annotation. experimental = Use only terms inferred from experimental evidence. non-electronic (default) = Use only non-electronically curated terms. all = all GO terms will be retrieved.
how about I use experimental? I am now trying.
Best
Hi @biomichal ,
Ah yes, of course. I forgot that you can narrow the results also using --go_evidence. You could also play with other parameters, like --tax_scope, --tax_scope_mode, --target_orthologs and --target_taxa.
I am not sure whether it is really better to display only best hit. There will cases where only retrieving annotation terms from all orthologs you will have enough sensitivity. It is something that (I guess) can vary from an orth group to another.
Regarding the 3 ontologies, it is something you will obtain as soon as you retrieve info from the GO terms. However, it is a valid suggestion that we may consider for future releases. Thank you.
Best, Carlos
Also see #389 , I suppose. I.e. I don't think this is an issue with too many annotation sources, but that Eggnog Mapper is printing the entire GO graph (or at the very least redundant levels of information); instead of only the tips of the graph.
Hi @biomichal , can I ask what did you decided to use in the end? I also want to do an enrichment analysis and I am not sure whether I have to use all GOs given for one gene. I also have a situation the I got a lot of GO terms per single transcript. Tnx!
Hi all,
im also struggling with the GO therms, did someone found a way to plot a CC, BP and MF charts at specific level from the output?