ontobio icon indicating copy to clipboard operation
ontobio copied to clipboard

GPAD output from ontobio has evidence codes instead of ECO class IDs

Open dougli1sqrd opened this issue 5 years ago • 3 comments

WB	WBGene00011392	involved_in	GO:0010466	PMID:9726255|WB_REF:WBPaper00003188	IDA			20090318	WB

Is an example of a gpad line from wb.gpad.

IDA should be an ECO id.

dougli1sqrd avatar Jul 18 '18 22:07 dougli1sqrd

https://github.com/biolink/ontobio/pull/202

dougli1sqrd avatar Jul 19 '18 19:07 dougli1sqrd

Number of genes is very close:

edouglass@Erics-MBP:~/lbl/geneontology/pipeline[testpypi_master ?]$ curl -L http://skyhook.berkeleybop.org/testpypi_master/annotations/wb.gaf.gz | gzip -dcf | cut -f 2 | sort | uniq | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2043k  100 2043k    0     0   109k      0  0:00:18  0:00:18 --:--:-- 66616
   14090
edouglass@Erics-MBP:~/lbl/geneontology/pipeline[testpypi_master ?]$ curl -L http://skyhook.berkeleybop.org/testpypi_master/annotations/wb.gpad.gz | gzip -dcf | cut -f 2 | sort | uniq | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1493k  100 1493k    0     0   107k      0  0:00:13  0:00:13 --:--:--  193k
   14079

Also, as far as IBA vs ECO:ECO:0000318:

edouglass@Erics-MBP:~/lbl/geneontology/pipeline[testpypi_master ?]$ curl -L http://skyhook.berkeleybop.org/testpypi_master/annotations/wb.gpad.gz | gzip -dcf | grep ECO:0000318 | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1493k  100 1493k    0     0   351k      0  0:00:04  0:00:04 --:--:--  363k
   24087

@kltm what do you think?

dougli1sqrd avatar Jul 24 '18 18:07 dougli1sqrd

Looks good from over here:

sjcarbon@moiraine:/tmp$:) zcat wb.gpad.gz | grep -v "^!" | cut -f 2 | sort | uniq | wc
  14078   14078  210682
sjcarbon@moiraine:/tmp$:) zcat wb.gaf.gz | grep -v "^!" | cut -f 2 | sort | uniq | wc
  14078   14078  210682

kltm avatar Jul 24 '18 20:07 kltm