RTX-KG2 icon indicating copy to clipboard operation
RTX-KG2 copied to clipboard

Build KG2.9.1

Open acevedol opened this issue 1 year ago • 6 comments

1. Build and load KG2:
  • [x] Create new build instance kg291build.rtx.ai
  • [x] Clone the RTX repo from Github git clone https://github.com/RTXteam/RTX-KG2.git
  • [x] Setup the KG2 build system bash -x RTX-KG2/setup-kg2-build.sh
  • [x] Check ~/kg2-build/setup-kg2-build.log to ensure setup completed successfully
  • [x] Run a dry build using bash -x ~/kg2-code/build-kg2-snakemake.sh all -F -n
  • [x] Check ~/kg2-build/build-kg2-snakemake-n.log to ensure all rules are included
  • [x] Run touch ~/kg2-build/minor-release for a minor release or touch ~/kg2-build/major-release for a major release. If you don't want to change the version number, ignore this step.
  • [x] Initiate a screen session screen -S buildkg2
  • [x] Start the build bash -x ~/kg2-code/build-kg2-snakemake.sh all -F
  • [x] Verify build completed by checking ~/kg2-build/build-kg2-snakemake.log
  • [x] Check the build version number in ~/kg2-build/kg2-version.txt
  • [x] Check report file kg2-simplified-report.json; compare against previous kg2-simplified-report.json to identify any major changes
  • [ ] Generate nodes.tsv and edges.tsv by running python3 kg2_json_to_kgx_tsv.py kg2-simplified.json
  • [ ] Generate content-metadata.json on build instance
  • [ ] Push nodes.tsv and edges.tsv to public S3 bucket with aws s3 /file/name s3://rtx-kg2-public
  • [x] Find an available kg2endpoint by checking rtx.ai under Networking on Lightsail
  • [x] install the new KG2 TSV files into Neo4j on the kg2endpoint
  • [x] Update code on kg2endpoint, then run setup-kg2-neo4j.sh if necessary
  • [x] Load KG2 into Neo4J RTX-KG2/tsv-to-neo4j.sh > ~/kg2-build/tsv-to-neo4j.log 2>&1
  • [ ] Update kg2-versions.md
  • [ ] create a new DNS CNAME record with CNAME kg2endpoint-kg2-X-Y.rtx.ai pointing to the hostname for the Neo4j endpoint (which might be something like kg2endpoint3.rtx.ai).
  • [ ] Update version numbers of upstream knowledge sources, for the new version of KG2 in kg2-versions.md (see Cypher command below).

Example Cypher to get versions of many of the knowledge sources in a specific build of KG2pre:

match (n:`biolink:InformationResource`) where not n.id =~ 'umls_.*' and not n.id =~ 'OBO:.*' return n.id, n.name order by n.id;

acevedol avatar Apr 08 '24 22:04 acevedol

Validation completed, starting partial build

acevedol avatar Apr 09 '24 19:04 acevedol

Error in rule DGIdb: jobid: 36 output: /home/ubuntu/kg2-build/dgidb/interactions.tsv log: /home/ubuntu/kg2-build/extract-dgidb.log (check log file(s) for error m essage) shell: bash -x /home/ubuntu/kg2-code/extract-dgidb.sh /home/ubuntu/kg2-build/dg idb > /home/ubuntu/kg2-build/extract-dgidb.log 2>&1

[Fri Apr 12 19:59:49 2024] Error in rule Reactome_Conversion: jobid: 21 output: /home/ubuntu/kg2-build/kg2-reactome-nodes.jsonl, /home/ubuntu/kg2-build/kg2-reactome-edges.jsonl log: /home/ubuntu/kg2-build/reactome_mysql_to_kg_jsonl.log (check log file(s) for error message) shell: /home/ubuntu/kg2-venv/bin/python3 -u /home/ubuntu/kg2-code/reactome_mysql_to_kg_jsonl.py /home/ubuntu/kg2-build/mysql-config.conf reactome /home/ubuntu/kg2-build/kg2-reactome-nodes.jsonl /home/ubuntu/kg2-build/kg2-reactome-edges.jsonl > /home/ubuntu/kg2-build/reactome_mysql_to_kg_jsonl.log 2>&1

acevedol avatar Apr 12 '24 20:04 acevedol

Error in rule ChEMBL_Conversion: jobid: 11 output: /home/ubuntu/kg2-build/kg2-chembl-nodes.jsonl, /home/ubuntu/kg2-build/kg2-chembl-edges.jsonl log: /home/ubuntu/kg2-build/chembl_mysql_to_kg_jsonl.log (check log file(s) for error message) shell: /home/ubuntu/kg2-venv/bin/python3 -u /home/ubuntu/kg2-code/chembl_mysql_to_kg_jsonl.py /home/ubuntu/kg2-build/mysql-config.conf chembl /home/ubuntu/kg2-build/kg2-chembl-nodes.jsonl /home/ubuntu/kg2-build/kg2-chembl-edges.jsonl > /home/ubuntu/kg2-build/chembl_mysql_to_kg_jsonl.log 2>&1

acevedol avatar Apr 13 '24 20:04 acevedol

Build completed, now I am uploading to Neo4J, testing, and reviewing the reports.

acevedol avatar Apr 17 '24 15:04 acevedol

loaded into neo4j on kg2endpoint4.rtx.ai

acevedol avatar Apr 23 '24 17:04 acevedol

Switched SEMMEDDB:TREATS to biolink:treats_or_applied_or_studied to treat and rebuilding from Simplify

acevedol avatar Apr 23 '24 23:04 acevedol