Build KG2.7.6
1. Partial Build and load KG2:
- [x] Run a dry build using
bash -x ~/kg2-code/build-kg2-snakemake.sh -n - [x] Check
~/kg2-build/build-kg2-snakemake-n.logto ensure all expected rules are included - [x] Initiate a screen session
screen -S buildkg2 - [x] Start the build
bash -x ~/kg2-code/build-kg2-snakemake.sh - [x] Verify build completed by checking
~/kg2-build/build-kg2-snakemake.log - [x] Check the build version number in
~/kg2-build/kg2-version.txt - [x] Check report file
kg2-simplified-report.json; compare against previouskg2-simplified-report.jsonto identify any major changes - [x] Tag build with version number
- [x] Generate nodes.tsv and edges.tsv by running
python3 kg2_json_to_kgx_tsv.py kg2-simplified.json - [x] Generate
content-metadata.jsonon build instance - [x] Push
nodes.tsv,edges.tsv,content-metadata.jsonto public S3 bucket withaws s3 /file/name s3://rtx-kg2-public - [x] Find an available kg2endpoint by checking
rtx.aiunderNetworkingon Lightsail - [x] install the new KG2 TSV files into Neo4j on the kg2endpoint
- [x] Update code on kg2endpoint, then run setup-kg2-neo4j.sh if necessary
- [x] Load KG2 into Neo4J
RTX-KG2/tsv-to-neo4j.sh > ~/kg2-build/tsv-to-neo4j.log 2>&1 - [x] Update kg2-versions.md
Because the error was on a few UBERON predicates in predicate-remap.yaml, we don't want to go through any of the extraction processes again.
This build will also address issue 196, so I will extract ncbi again.
I removed kg2-simplified.json to trigger running Simplify and downstream Rules again.
I removed /ncbigene in order to extract the updated NCBI ontology.
I removed kg2-ont.json to run Rule Ontologies_and_TTL again.
I'm not sure what additional files need to be deleted to run a partial build, so I am testing with a dry run to verify what rules this triggers.
A dry build shows that the expected rules will run, so I am moving forward with a test run.
First issue:
[Fri May 6 19:09:06 2022]
Error in rule UniProtKB_Conversion:
jobid: 8
output: /home/ubuntu/kg2-build/kg2-uniprotkb.json
log: /home/ubuntu/kg2-build/uniprotkb-dat-to-json.log (check log file(s) for error message)
shell:
/home/ubuntu/kg2-venv/bin/python3 -u /home/ubuntu/kg2-code/uniprotkb_dat_to_json.py /home/ubuntu/kg2-build/uniprotkb/uniprot_sprot.dat /home/ubuntu/kg2-build/kg2-uniprotkb.json > /home/ubuntu/kg2-build/uniprotkb-dat-to-json.log 2>&1
(exited with non-zero exit code)
More info in log:
Traceback (most recent call last):
File "/home/ubuntu/kg2-code/uniprotkb_dat_to_json.py", line 353, in <module>
test_mode)
File "/home/ubuntu/kg2-code/uniprotkb_dat_to_json.py", line 122, in parse_records_from_uniprot_dat
return [record_list, update_date, version]
UnboundLocalError: local variable 'update_date' referenced before assignment
Another error:
[Fri May 6 19:09:05 2022]
Error in rule HMDB_Conversion:
jobid: 18
output: /home/ubuntu/kg2-build/kg2-hmdb.json
log: /home/ubuntu/kg2-build/hmdb-xml-to-kg-json.log (check log file(s) for error m
essage)
shell:
/home/ubuntu/kg2-venv/bin/python3 -u /home/ubuntu/kg2-code/hmdb_xml_to_kg_json
.py /home/ubuntu/kg2-build/hmdb_metabolites.xml /home/ubuntu/kg2-build/kg2-hmdb.json
> /home/ubuntu/kg2-build/hmdb-xml-to-kg-json.log 2>&1
With additional info
Traceback (most recent call last):
File "/home/ubuntu/kg2-code/hmdb_xml_to_kg_json.py", line 625, in <module>
metabolite_data = xmltodict.parse(xml_file.read())
File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/xmltodict.py", line 327, in parse
parser.Parse(xml_input, True)
xml.parsers.expat.ExpatError: no element found: line 1, column 0
Error in rule KEGG_Conversion:
[Fri May 6 19:09:06 2022]
Error in rule KEGG_Conversion:
jobid: 26
output: /home/ubuntu/kg2-build/kg2-kegg.json
shell:
/home/ubuntu/kg2-venv/bin/python3 -u /home/ubuntu/kg2-code/kegg_json_to_kg_jso
n.py /home/ubuntu/kg2-build/kegg.json /home/ubuntu/kg2-build/kg2-kegg.json
(exited with non-zero exit code)
Error in rule Finish:
Error in rule Finish:
jobid: 0
RuleException:
AttributeError in line 13 of /home/ubuntu/kg2-code/Snakefile:
'InputFiles' object has no attribute 'simplified_output_nodes_file_full'
File "/home/ubuntu/kg2-code/Snakefile", line 13, in __rule_Finish
File "/usr/lib/python3.7/string.py", line 186, in format
File "/usr/lib/python3.7/string.py", line 190, in vformat
File "/usr/lib/python3.7/string.py", line 230, in _vformat
File "/usr/lib/python3.7/string.py", line 301, in get_field
File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
Exiting because a job execution failed. Look above for error message
Problems with edges:
There are no edges from infores:intact in this build. There were 271577 in the previous build.
There are no edges from infores:ncbi-taxonomy in this build. There were 3552996 in the previous build.
I re-ran the IntAct Conversion, and it looks better, but I'm still not sure what's going on with NCBI Taxon & Taxonomy
Wrong version showing in s3://rtx-kg2-public/kg2-version.txt... It says 2.7.7 instead of 2.7.6.
The kg2-simplified-report.json no longer has any rename: predicates
Loading KG2.7.6 onto KG2endpoint3
Pushed KG2.7.6 tag