RTX-KG2 icon indicating copy to clipboard operation
RTX-KG2 copied to clipboard

Build KG2.7.6

Open acevedol opened this issue 3 years ago • 12 comments

1. Partial Build and load KG2:
  • [x] Run a dry build using bash -x ~/kg2-code/build-kg2-snakemake.sh -n
  • [x] Check ~/kg2-build/build-kg2-snakemake-n.log to ensure all expected rules are included
  • [x] Initiate a screen session screen -S buildkg2
  • [x] Start the build bash -x ~/kg2-code/build-kg2-snakemake.sh
  • [x] Verify build completed by checking ~/kg2-build/build-kg2-snakemake.log
  • [x] Check the build version number in ~/kg2-build/kg2-version.txt
  • [x] Check report file kg2-simplified-report.json; compare against previous kg2-simplified-report.json to identify any major changes
  • [x] Tag build with version number
  • [x] Generate nodes.tsv and edges.tsv by running python3 kg2_json_to_kgx_tsv.py kg2-simplified.json
  • [x] Generate content-metadata.json on build instance
  • [x] Push nodes.tsv, edges.tsv, content-metadata.json to public S3 bucket with aws s3 /file/name s3://rtx-kg2-public
  • [x] Find an available kg2endpoint by checking rtx.ai under Networking on Lightsail
  • [x] install the new KG2 TSV files into Neo4j on the kg2endpoint
  • [x] Update code on kg2endpoint, then run setup-kg2-neo4j.sh if necessary
  • [x] Load KG2 into Neo4J RTX-KG2/tsv-to-neo4j.sh > ~/kg2-build/tsv-to-neo4j.log 2>&1
  • [x] Update kg2-versions.md

acevedol avatar May 06 '22 18:05 acevedol

Because the error was on a few UBERON predicates in predicate-remap.yaml, we don't want to go through any of the extraction processes again. This build will also address issue 196, so I will extract ncbi again. I removed kg2-simplified.json to trigger running Simplify and downstream Rules again. I removed /ncbigene in order to extract the updated NCBI ontology. I removed kg2-ont.json to run Rule Ontologies_and_TTL again. I'm not sure what additional files need to be deleted to run a partial build, so I am testing with a dry run to verify what rules this triggers.

acevedol avatar May 06 '22 18:05 acevedol

A dry build shows that the expected rules will run, so I am moving forward with a test run.

acevedol avatar May 06 '22 19:05 acevedol

First issue:

[Fri May  6 19:09:06 2022]
Error in rule UniProtKB_Conversion:
    jobid: 8
    output: /home/ubuntu/kg2-build/kg2-uniprotkb.json
    log: /home/ubuntu/kg2-build/uniprotkb-dat-to-json.log (check log file(s) for error message)
    shell:
        /home/ubuntu/kg2-venv/bin/python3 -u /home/ubuntu/kg2-code/uniprotkb_dat_to_json.py  /home/ubuntu/kg2-build/uniprotkb/uniprot_sprot.dat /home/ubuntu/kg2-build/kg2-uniprotkb.json > /home/ubuntu/kg2-build/uniprotkb-dat-to-json.log 2>&1
        (exited with non-zero exit code)

More info in log:

Traceback (most recent call last):
  File "/home/ubuntu/kg2-code/uniprotkb_dat_to_json.py", line 353, in <module>
    test_mode)
  File "/home/ubuntu/kg2-code/uniprotkb_dat_to_json.py", line 122, in parse_records_from_uniprot_dat
    return [record_list, update_date, version]
UnboundLocalError: local variable 'update_date' referenced before assignment

acevedol avatar May 06 '22 19:05 acevedol

Another error:

[Fri May  6 19:09:05 2022]
Error in rule HMDB_Conversion:
    jobid: 18
    output: /home/ubuntu/kg2-build/kg2-hmdb.json
    log: /home/ubuntu/kg2-build/hmdb-xml-to-kg-json.log (check log file(s) for error m
essage)
    shell:
        /home/ubuntu/kg2-venv/bin/python3 -u /home/ubuntu/kg2-code/hmdb_xml_to_kg_json
.py  /home/ubuntu/kg2-build/hmdb_metabolites.xml /home/ubuntu/kg2-build/kg2-hmdb.json
> /home/ubuntu/kg2-build/hmdb-xml-to-kg-json.log 2>&1

With additional info

Traceback (most recent call last):
  File "/home/ubuntu/kg2-code/hmdb_xml_to_kg_json.py", line 625, in <module>
    metabolite_data = xmltodict.parse(xml_file.read())
  File "/home/ubuntu/kg2-venv/lib/python3.7/site-packages/xmltodict.py", line 327, in parse
    parser.Parse(xml_input, True)
xml.parsers.expat.ExpatError: no element found: line 1, column 0

acevedol avatar May 06 '22 21:05 acevedol

Error in rule KEGG_Conversion:

[Fri May  6 19:09:06 2022]
Error in rule KEGG_Conversion:
    jobid: 26
    output: /home/ubuntu/kg2-build/kg2-kegg.json
    shell:
        /home/ubuntu/kg2-venv/bin/python3 -u /home/ubuntu/kg2-code/kegg_json_to_kg_jso
n.py  /home/ubuntu/kg2-build/kegg.json /home/ubuntu/kg2-build/kg2-kegg.json
        (exited with non-zero exit code)

acevedol avatar May 06 '22 21:05 acevedol

Error in rule Finish:

Error in rule Finish:
    jobid: 0

RuleException:
AttributeError in line 13 of /home/ubuntu/kg2-code/Snakefile:
'InputFiles' object has no attribute 'simplified_output_nodes_file_full'
  File "/home/ubuntu/kg2-code/Snakefile", line 13, in __rule_Finish
  File "/usr/lib/python3.7/string.py", line 186, in format
  File "/usr/lib/python3.7/string.py", line 190, in vformat
  File "/usr/lib/python3.7/string.py", line 230, in _vformat
  File "/usr/lib/python3.7/string.py", line 301, in get_field
  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
Exiting because a job execution failed. Look above for error message

acevedol avatar May 08 '22 17:05 acevedol

Problems with edges:

There are no edges from infores:intact in this build. There were 271577 in the previous build.
There are no edges from infores:ncbi-taxonomy in this build. There were 3552996 in the previous build.

acevedol avatar May 08 '22 17:05 acevedol

I re-ran the IntAct Conversion, and it looks better, but I'm still not sure what's going on with NCBI Taxon & Taxonomy

acevedol avatar May 09 '22 18:05 acevedol

Wrong version showing in s3://rtx-kg2-public/kg2-version.txt... It says 2.7.7 instead of 2.7.6.

acevedol avatar May 10 '22 16:05 acevedol

The kg2-simplified-report.json no longer has any rename: predicates

acevedol avatar May 11 '22 21:05 acevedol

Loading KG2.7.6 onto KG2endpoint3

acevedol avatar May 12 '22 21:05 acevedol

Pushed KG2.7.6 tag

acevedol avatar May 16 '22 22:05 acevedol