ray icon indicating copy to clipboard operation
ray copied to clipboard

Greater than 1 and negative proportions in gene ontology

Open fredericraymond opened this issue 12 years ago • 4 comments

Oberved

For example, in file

/rap/nne-790-ab/projects/FR-MicrobiomeSinges/20130215_nofail/RayMeta_Sample_JE1/BiologicalAbundances/_GeneOntology/biological_process.Depth=1.tsv

Identifier Name Proportion Observations Total

GO:0000003 reproduction 0.0123192 267993 21754013 GO:0002376 immune system process 0.749373 16301880 21754013 GO:0008152 metabolic process -72.3133 -1573105383 21754013 GO:0009987 cellular process 17.0055 369936933 21754013 GO:0022414 reproductive process 0.985647 21441772 21754013 GO:0022610 biological adhesion 0.468828 10198901 21754013 GO:0023052 signaling 0.265745 5781018 21754013 GO:0032501 multicellular organismal process 1.76106 38310165 21754013 GO:0032502 developmental process 17.6075 383034541 21754013

See Metabolic Process.

Expectations

Be able to get the "levelled" GO without weird numbers. In fact, what I need is a file like 0.Profile.GeneOntologyDomain=biological_process.tsv but with level information.

fredericraymond avatar Feb 17 '13 20:02 fredericraymond

There was a discussion last week on the mailing list about proportions exceeding 100% for the files at specific depth.

http://permalink.gmane.org/gmane.science.biology.ray-genome-assembler/406

This happens because EMBL_CDS can annotate any kmer on several ontology terms that are all on the same path from the root to a particular term in the Gene Ontology directed acyclic graph.

For Gene Ontology, only Terms.xml, Terms.tsv are documented in Documentation/

In the Genome Biology paper, Terms.xml was used.

This ticket (and the issue reported on the mailing list) will likely be resolved by removing these files for particular levels because recursive counts are not useful here.

sebhtml avatar Feb 18 '13 04:02 sebhtml

This means that if I parse Terms.xml I will get the correct information?

fredericraymond avatar Feb 18 '13 15:02 fredericraymond

Terms.tsv contains the same information that is in Terms.xml.

These, however, are not recursive counts.

On 02/18/2013 10:07 AM, fredericraymond wrote:

This means that if I parse Terms.xml I will get the correct information?

— Reply to this email directly or view it on GitHub https://github.com/sebhtml/ray/issues/158#issuecomment-13725870.

sebhtml avatar Feb 18 '13 16:02 sebhtml

Evaluation: 5 human-hours

This is presumably a WONTFIX, see above.

sebhtml avatar Apr 17 '13 15:04 sebhtml