RTX-KG2
RTX-KG2 copied to clipboard
Build KG2.8.6
1. Build and load KG2:
- [x] Clear the instance using
bash -x clear-instance.sh - [x] Clone the RTX repo from Github
git clone https://github.com/RTXteam/RTX-KG2.git - [x] Setup the KG2 build system
bash -x RTX-KG2/setup-kg2-build.sh - [x] Check
~/kg2-build/setup-kg2-build.logto ensure setup completed successfully - [x] Run a dry build using
bash -x ~/kg2-code/build-kg2-snakemake.sh all -F -n - [x] Check
~/kg2-build/build-kg2-snakemake-n.logto ensure all rules are included - [x] Run
touch ~/kg2-build/minor-releasefor a minor release ortouch ~/kg2-build/major-releasefor a major release. If you don't want to change the version number, ignore this step. - [x] Initiate a screen session
screen -S buildkg2 - [x] Start the build
bash -x ~/kg2-code/build-kg2-snakemake.sh all -F - [x] Verify build completed by checking
~/kg2-build/build-kg2-snakemake.log - [x] Check the build version number in
~/kg2-build/kg2-version.txt - [ ] Check report file
kg2-simplified-report.json; compare against previouskg2-simplified-report.jsonto identify any major changes - [ ] Generate nodes.tsv and edges.tsv by running
python3 kg2_json_to_kgx_tsv.py kg2-simplified.json - [ ] Generate
content-metadata.jsonon build instance - [ ] Push nodes.tsv and edges.tsv to public S3 bucket with
aws s3 /file/name s3://rtx-kg2-public - [ ] Find an available kg2endpoint by checking
rtx.aiunderNetworkingon Lightsail - [ ] install the new KG2 TSV files into Neo4j on the kg2endpoint
- [ ] Update code on kg2endpoint, then run setup-kg2-neo4j.sh if necessary
- [ ] Load KG2 into Neo4J
RTX-KG2/tsv-to-neo4j.sh > ~/kg2-build/tsv-to-neo4j.log 2>&1 - [ ] Update kg2-versions.md
- [ ] Update version numbers of upstream knowledge sources, for the new version of KG2 in
kg2-versions.md(see Cypher command below).
Example Cypher to get versions of many of the knowledge sources in a specific build of KG2pre:
match (n:`biolink:RetreivalSource`) where not n.id =~ 'umls_.*' and not n.id =~ 'OBO:.*' return n.id, n.name order by n.id;
Source predicate curie is missing from the YAML config file: ORPHANET:C057
Source predicate curie is missing from the YAML config file: RO:0002428
Source predicate curie is missing from the YAML config file: ORPHANET:C056
Source predicate curie is missing from the YAML config file: DrugCentral:reduce_risk
@acevedol Looks like some of the checkboxes could be checked here (e.g., Neo4j endpoint); can we please get a status refresh on the checklist? Thanks.