ontobio icon indicating copy to clipboard operation
ontobio copied to clipboard

Annotation parsing for large files can become intractable

Open kltm opened this issue 4 years ago • 0 comments

The case we ran into here was parsing the goa_uniprot_all.gaf.gz file for the GO. Specifically:

ontobio-parse-assocs.py -f /opt/go-site/annotations/goa_uniprot_all.gaf -F gaf -o /opt/go-site/annotations_new/goa_uniprot_all.gaf -I /opt/go-site/gaferencer-products/all.gaferences.json --report-md /tmp/report.md --report-json /tmp/report.json convert --to gaf

On a large server, we killed this after over 24hrs and 500GB of RAM consumed, no end in sight.

There are several theories for this. We might also want to just tackle this with a python profiler to see where the issue is hiding.

Tagging @dougli1sqrd @dustine32

kltm avatar Jul 02 '20 22:07 kltm