RTX-KG2 icon indicating copy to clipboard operation
RTX-KG2 copied to clipboard

Build KG2.8.4

Open ecwood opened this issue 2 years ago • 4 comments

1. Build and load KG2:
  • [x] Clone the RTX repo from Github git clone https://github.com/RTXteam/RTX-KG2.git
  • [x] Clear the instance using bash -x clear-instance.sh
  • [x] Setup the KG2 build system bash -x RTX-KG2/setup-kg2-build.sh
  • [x] Check ~/kg2-build/setup-kg2-build.log to ensure setup completed successfully
  • [x] Run a dry build using bash -x ~/kg2-code/build-kg2-snakemake.sh all -F -n
  • [x] Check ~/kg2-build/build-kg2-snakemake-n.log to ensure all rules are included
  • [x] Run touch ~/kg2-build/minor-release for a minor release or touch ~/kg2-build/major-release for a major release. If you don't want to change the version number, ignore this step.
  • [x] Initiate a screen session screen -S buildkg2
  • [x] Start the build bash -x ~/kg2-code/build-kg2-snakemake.sh all -F
  • [x] Verify build completed by checking ~/kg2-build/build-kg2-snakemake.log
  • [x] Check the build version number in ~/kg2-build/kg2-version.txt
  • [x] Check report file kg2-simplified-report.json; compare against previous kg2-simplified-report.json to identify any major changes
  • [ ] Generate nodes.tsv and edges.tsv by running python3 kg2_json_to_kgx_tsv.py kg2-simplified.json
  • [ ] Generate content-metadata.json on build instance
  • [ ] Push nodes.tsv and edges.tsv to public S3 bucket with aws s3 /file/name s3://rtx-kg2-public
  • [x] Find an available kg2endpoint by checking rtx.ai under Networking on Lightsail
  • [x] install the new KG2 TSV files into Neo4j on the kg2endpoint
  • [x] Update code on kg2endpoint, then run setup-kg2-neo4j.sh if necessary
  • [x] Load KG2 into Neo4J RTX-KG2/tsv-to-neo4j.sh > ~/kg2-build/tsv-to-neo4j.log 2>&1
  • [x] Update kg2-versions.md
  • [x] Update version numbers of upstream knowledge sources, for the new version of KG2 in kg2-versions.md (see Cypher command below).

Example Cypher to get versions of many of the knowledge sources in a specific build of KG2pre:

match (n:`biolink:RetrievalSource`) where not n.id =~ 'umls_.*' and not n.id =~ 'OBO:.*' return n.id, n.name order by n.id;

We're not planning to start this build immediately, I just want to have an issue to start tagging things with.

ecwood avatar Jul 07 '23 17:07 ecwood

This error occurred during the build:

[Thu Jul 20 19:34:20 2023]
Error in rule SemMedDB:
    jobid: 30
    output: /home/ubuntu/kg2-build/semmeddb/kg2-semmeddb-tuplelist.json, /home/ubuntu/kg2-build/semmed-exclude-list.yaml
    log: /home/ubuntu/kg2-build/extract-semmeddb.log (check log file(s) for error message)
    shell:
        bash -x /home/ubuntu/kg2-code/extract-semmeddb.sh /home/ubuntu/kg2-build/semmeddb/kg2-semmeddb-tuplelist.json /home/ubuntu/kg2-build/semmed-exclude-list.yaml  > /home/ubuntu/kg2-build/extract-semmeddb.log 2>&1
        (exited with non-zero exit code)

It is covered in more detail here: https://github.com/RTXteam/RTX-KG2/issues/294#issuecomment-1644489313

ecwood avatar Jul 20 '23 19:07 ecwood

There was an error with SMPDB, which doesn't make sense since it was recently tested and worked the first time around:

[Thu Jul 20 21:02:58 2023]
Error in rule SMPDB:
    jobid: 38
    output: /home/ubuntu/kg2-build/smpdb/pathbank_pathways.csv
    log: /home/ubuntu/kg2-build/extract-smpdb.log (check log file(s) for error message)
    shell:
        bash -x /home/ubuntu/kg2-code/extract-smpdb.sh /home/ubuntu/kg2-build/smpdb > /home/ubuntu/kg2-build/extract-smpdb.log 2>&1
        (exited with non-zero exit code)

ubuntu@ip-172-31-62-73:~/kg2-build$ curl -L -f -k /home/ubuntu/kg2-build/smpdb/ https://pathbank.org/downloads/pathbank_all_pwml.zip
curl: (3) URL using bad/illegal format or missing URL
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.

I am going to try restarting to see if that fixes it.

ecwood avatar Jul 20 '23 21:07 ecwood

An error occurred in Ontologies and TTL, due to #303:

Traceback (most recent call last):
  File "/home/ubuntu/kg2-code/multi_ont_to_json_kg.py", line 1391, in <module>
    save_pickle)
  File "/home/ubuntu/kg2-code/multi_ont_to_json_kg.py", line 142, in make_kg2
    assert os.path.exists(ont_source_info_dict['file']), local_file_name
AssertionError: foodon.owl

download in ont-load-inventory.yaml needed to be set to true to trigger the right if statement to use the pickle file. A better solution should be used in the future.

ecwood avatar Jul 21 '23 05:07 ecwood

Here were the major report changes:

There was a significant drop in edges from infores:fma-umls in this build. The count dropped from 368827 to 290475
There was a significant drop in edges from infores:go in this build. The count dropped from 202983 to 132359
There was a significant drop in edges from infores:hgnc in this build. The count dropped from 42515 to 23262
There are no edges from infores:loinc-umls in this build. There were 2690586 in the previous build.
There was a significant drop in edges from infores:ncbi-taxon in this build. The count dropped from 3971385 to 1393247
There are no edges from infores:vandf-umls in this build. There were 140078 in the previous build.

ecwood avatar Jul 24 '23 21:07 ecwood