RTX-KG2 icon indicating copy to clipboard operation
RTX-KG2 copied to clipboard

Error in DGIdb Conversion

Open acevedol opened this issue 1 year ago • 1 comments

Traceback (most recent call last):
  File "/home/ubuntu/kg2-code/dgidb_tsv_to_kg_jsonl.py", line 167, in <module>
    make_kg2_graph(input_file_name, nodes_output, edges_output, test_mode)
  File "/home/ubuntu/kg2-code/dgidb_tsv_to_kg_jsonl.py", line 83, in make_kg2_graph
    PMIDs] = fields
ValueError: too many values to unpack (expected 11)

The old fields in our conversion dgidb_tsv_to_kg_jsonl.py are

[gene_name,
             gene_claim_name,
             entrez_id,
             interaction_claim_source,
             interaction_types,
             drug_claim_name,
             drug_claim_primary_name,
             drug_name,
             drug_concept_id,
             _, #12.5.2020 new field in tsv: interaction group score
             PMIDs] = fields

but it looks like DGIdb changed their structure recently and now the available fields in interactions.tsv are:

gene_claim_name
gene_concept_id
gene_name
interaction_source_db_name
interaction_source_db_version
interaction_type
interaction_score
drug_claim_name
drug_concept_id
drug_name
approved
immunotherapy
anti_neoplastic

acevedol avatar Mar 13 '24 15:03 acevedol

Per Slack DM thread, the current plan is to use the May 2021 release of DGIdb for the purpose of the KG2.9.0pre build.

Then we'll update the DGIdb ETL script for RTX-KG2, so going forward in the subsequent builds, we can use the latest DGIdb release.

saramsey avatar Mar 13 '24 15:03 saramsey