Build KG2.9.1
1. Build and load KG2:
- [x] Create new build instance
kg291build.rtx.ai - [x] Clone the RTX repo from Github
git clone https://github.com/RTXteam/RTX-KG2.git - [x] Setup the KG2 build system
bash -x RTX-KG2/setup-kg2-build.sh - [x] Check
~/kg2-build/setup-kg2-build.logto ensure setup completed successfully - [x] Run a dry build using
bash -x ~/kg2-code/build-kg2-snakemake.sh all -F -n - [x] Check
~/kg2-build/build-kg2-snakemake-n.logto ensure all rules are included - [x] Run
touch ~/kg2-build/minor-releasefor a minor release ortouch ~/kg2-build/major-releasefor a major release. If you don't want to change the version number, ignore this step. - [x] Initiate a screen session
screen -S buildkg2 - [x] Start the build
bash -x ~/kg2-code/build-kg2-snakemake.sh all -F - [x] Verify build completed by checking
~/kg2-build/build-kg2-snakemake.log - [x] Check the build version number in
~/kg2-build/kg2-version.txt - [x] Check report file
kg2-simplified-report.json; compare against previouskg2-simplified-report.jsonto identify any major changes - [ ] Generate nodes.tsv and edges.tsv by running
python3 kg2_json_to_kgx_tsv.py kg2-simplified.json - [ ] Generate
content-metadata.jsonon build instance - [ ] Push nodes.tsv and edges.tsv to public S3 bucket with
aws s3 /file/name s3://rtx-kg2-public - [x] Find an available kg2endpoint by checking
rtx.aiunderNetworkingon Lightsail - [x] install the new KG2 TSV files into Neo4j on the kg2endpoint
- [x] Update code on kg2endpoint, then run setup-kg2-neo4j.sh if necessary
- [x] Load KG2 into Neo4J
RTX-KG2/tsv-to-neo4j.sh > ~/kg2-build/tsv-to-neo4j.log 2>&1 - [ ] Update kg2-versions.md
- [ ] create a new DNS CNAME record with CNAME
kg2endpoint-kg2-X-Y.rtx.aipointing to the hostname for the Neo4j endpoint (which might be something likekg2endpoint3.rtx.ai). - [ ] Update version numbers of upstream knowledge sources, for the new version of KG2 in
kg2-versions.md(see Cypher command below).
Example Cypher to get versions of many of the knowledge sources in a specific build of KG2pre:
match (n:`biolink:InformationResource`) where not n.id =~ 'umls_.*' and not n.id =~ 'OBO:.*' return n.id, n.name order by n.id;
Validation completed, starting partial build
Error in rule DGIdb: jobid: 36 output: /home/ubuntu/kg2-build/dgidb/interactions.tsv log: /home/ubuntu/kg2-build/extract-dgidb.log (check log file(s) for error m essage) shell: bash -x /home/ubuntu/kg2-code/extract-dgidb.sh /home/ubuntu/kg2-build/dg idb > /home/ubuntu/kg2-build/extract-dgidb.log 2>&1
[Fri Apr 12 19:59:49 2024] Error in rule Reactome_Conversion: jobid: 21 output: /home/ubuntu/kg2-build/kg2-reactome-nodes.jsonl, /home/ubuntu/kg2-build/kg2-reactome-edges.jsonl log: /home/ubuntu/kg2-build/reactome_mysql_to_kg_jsonl.log (check log file(s) for error message) shell: /home/ubuntu/kg2-venv/bin/python3 -u /home/ubuntu/kg2-code/reactome_mysql_to_kg_jsonl.py /home/ubuntu/kg2-build/mysql-config.conf reactome /home/ubuntu/kg2-build/kg2-reactome-nodes.jsonl /home/ubuntu/kg2-build/kg2-reactome-edges.jsonl > /home/ubuntu/kg2-build/reactome_mysql_to_kg_jsonl.log 2>&1
Error in rule ChEMBL_Conversion: jobid: 11 output: /home/ubuntu/kg2-build/kg2-chembl-nodes.jsonl, /home/ubuntu/kg2-build/kg2-chembl-edges.jsonl log: /home/ubuntu/kg2-build/chembl_mysql_to_kg_jsonl.log (check log file(s) for error message) shell: /home/ubuntu/kg2-venv/bin/python3 -u /home/ubuntu/kg2-code/chembl_mysql_to_kg_jsonl.py /home/ubuntu/kg2-build/mysql-config.conf chembl /home/ubuntu/kg2-build/kg2-chembl-nodes.jsonl /home/ubuntu/kg2-build/kg2-chembl-edges.jsonl > /home/ubuntu/kg2-build/chembl_mysql_to_kg_jsonl.log 2>&1
Build completed, now I am uploading to Neo4J, testing, and reviewing the reports.
loaded into neo4j on kg2endpoint4.rtx.ai
Switched SEMMEDDB:TREATS to biolink:treats_or_applied_or_studied to treat and rebuilding from Simplify