ontobio
ontobio copied to clipboard
Annotation parsing for large files can become intractable
The case we ran into here was parsing the goa_uniprot_all.gaf.gz file for the GO. Specifically:
ontobio-parse-assocs.py -f /opt/go-site/annotations/goa_uniprot_all.gaf -F gaf -o /opt/go-site/annotations_new/goa_uniprot_all.gaf -I /opt/go-site/gaferencer-products/all.gaferences.json --report-md /tmp/report.md --report-json /tmp/report.json convert --to gaf
On a large server, we killed this after over 24hrs and 500GB of RAM consumed, no end in sight.
There are several theories for this. We might also want to just tackle this with a python profiler to see where the issue is hiding.
Tagging @dougli1sqrd @dustine32