Build KG2.8.4
1. Build and load KG2:
- [x] Clone the RTX repo from Github
git clone https://github.com/RTXteam/RTX-KG2.git - [x] Clear the instance using
bash -x clear-instance.sh - [x] Setup the KG2 build system
bash -x RTX-KG2/setup-kg2-build.sh - [x] Check
~/kg2-build/setup-kg2-build.logto ensure setup completed successfully - [x] Run a dry build using
bash -x ~/kg2-code/build-kg2-snakemake.sh all -F -n - [x] Check
~/kg2-build/build-kg2-snakemake-n.logto ensure all rules are included - [x] Run
touch ~/kg2-build/minor-releasefor a minor release ortouch ~/kg2-build/major-releasefor a major release. If you don't want to change the version number, ignore this step. - [x] Initiate a screen session
screen -S buildkg2 - [x] Start the build
bash -x ~/kg2-code/build-kg2-snakemake.sh all -F - [x] Verify build completed by checking
~/kg2-build/build-kg2-snakemake.log - [x] Check the build version number in
~/kg2-build/kg2-version.txt - [x] Check report file
kg2-simplified-report.json; compare against previouskg2-simplified-report.jsonto identify any major changes - [ ] Generate nodes.tsv and edges.tsv by running
python3 kg2_json_to_kgx_tsv.py kg2-simplified.json - [ ] Generate
content-metadata.jsonon build instance - [ ] Push nodes.tsv and edges.tsv to public S3 bucket with
aws s3 /file/name s3://rtx-kg2-public - [x] Find an available kg2endpoint by checking
rtx.aiunderNetworkingon Lightsail - [x] install the new KG2 TSV files into Neo4j on the kg2endpoint
- [x] Update code on kg2endpoint, then run setup-kg2-neo4j.sh if necessary
- [x] Load KG2 into Neo4J
RTX-KG2/tsv-to-neo4j.sh > ~/kg2-build/tsv-to-neo4j.log 2>&1 - [x] Update kg2-versions.md
- [x] Update version numbers of upstream knowledge sources, for the new version of KG2 in
kg2-versions.md(see Cypher command below).
Example Cypher to get versions of many of the knowledge sources in a specific build of KG2pre:
match (n:`biolink:RetrievalSource`) where not n.id =~ 'umls_.*' and not n.id =~ 'OBO:.*' return n.id, n.name order by n.id;
We're not planning to start this build immediately, I just want to have an issue to start tagging things with.
This error occurred during the build:
[Thu Jul 20 19:34:20 2023]
Error in rule SemMedDB:
jobid: 30
output: /home/ubuntu/kg2-build/semmeddb/kg2-semmeddb-tuplelist.json, /home/ubuntu/kg2-build/semmed-exclude-list.yaml
log: /home/ubuntu/kg2-build/extract-semmeddb.log (check log file(s) for error message)
shell:
bash -x /home/ubuntu/kg2-code/extract-semmeddb.sh /home/ubuntu/kg2-build/semmeddb/kg2-semmeddb-tuplelist.json /home/ubuntu/kg2-build/semmed-exclude-list.yaml > /home/ubuntu/kg2-build/extract-semmeddb.log 2>&1
(exited with non-zero exit code)
It is covered in more detail here: https://github.com/RTXteam/RTX-KG2/issues/294#issuecomment-1644489313
There was an error with SMPDB, which doesn't make sense since it was recently tested and worked the first time around:
[Thu Jul 20 21:02:58 2023]
Error in rule SMPDB:
jobid: 38
output: /home/ubuntu/kg2-build/smpdb/pathbank_pathways.csv
log: /home/ubuntu/kg2-build/extract-smpdb.log (check log file(s) for error message)
shell:
bash -x /home/ubuntu/kg2-code/extract-smpdb.sh /home/ubuntu/kg2-build/smpdb > /home/ubuntu/kg2-build/extract-smpdb.log 2>&1
(exited with non-zero exit code)
ubuntu@ip-172-31-62-73:~/kg2-build$ curl -L -f -k /home/ubuntu/kg2-build/smpdb/ https://pathbank.org/downloads/pathbank_all_pwml.zip
curl: (3) URL using bad/illegal format or missing URL
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
I am going to try restarting to see if that fixes it.
An error occurred in Ontologies and TTL, due to #303:
Traceback (most recent call last):
File "/home/ubuntu/kg2-code/multi_ont_to_json_kg.py", line 1391, in <module>
save_pickle)
File "/home/ubuntu/kg2-code/multi_ont_to_json_kg.py", line 142, in make_kg2
assert os.path.exists(ont_source_info_dict['file']), local_file_name
AssertionError: foodon.owl
download in ont-load-inventory.yaml needed to be set to true to trigger the right if statement to use the pickle file. A better solution should be used in the future.
Here were the major report changes:
There was a significant drop in edges from infores:fma-umls in this build. The count dropped from 368827 to 290475
There was a significant drop in edges from infores:go in this build. The count dropped from 202983 to 132359
There was a significant drop in edges from infores:hgnc in this build. The count dropped from 42515 to 23262
There are no edges from infores:loinc-umls in this build. There were 2690586 in the previous build.
There was a significant drop in edges from infores:ncbi-taxon in this build. The count dropped from 3971385 to 1393247
There are no edges from infores:vandf-umls in this build. There were 140078 in the previous build.