RTX-KG2
RTX-KG2 copied to clipboard
Error in DGIdb Conversion
Traceback (most recent call last):
File "/home/ubuntu/kg2-code/dgidb_tsv_to_kg_jsonl.py", line 167, in <module>
make_kg2_graph(input_file_name, nodes_output, edges_output, test_mode)
File "/home/ubuntu/kg2-code/dgidb_tsv_to_kg_jsonl.py", line 83, in make_kg2_graph
PMIDs] = fields
ValueError: too many values to unpack (expected 11)
The old fields in our conversion dgidb_tsv_to_kg_jsonl.py are
[gene_name,
gene_claim_name,
entrez_id,
interaction_claim_source,
interaction_types,
drug_claim_name,
drug_claim_primary_name,
drug_name,
drug_concept_id,
_, #12.5.2020 new field in tsv: interaction group score
PMIDs] = fields
but it looks like DGIdb changed their structure recently and now the available fields in interactions.tsv are:
gene_claim_name
gene_concept_id
gene_name
interaction_source_db_name
interaction_source_db_version
interaction_type
interaction_score
drug_claim_name
drug_concept_id
drug_name
approved
immunotherapy
anti_neoplastic
Per Slack DM thread, the current plan is to use the May 2021 release of DGIdb for the purpose of the KG2.9.0pre build.
Then we'll update the DGIdb ETL script for RTX-KG2, so going forward in the subsequent builds, we can use the latest DGIdb release.