GraphCare
GraphCare copied to clipboard
Having trouble reproducing the results
Hi,
Thank you for the great work.
I followed the current framework trying to reproduce the results but cannot outperform the RNN/Transformer baselines. It seems like the kg used in the provided code is only with GPT-KG, not merging with UMLS-KG?
The process I did are:
**run /graphcare_/graph_generation/graph_gen.ipynb
outputs:
/graphs/condition/CCSCM/{key}.txt
/graphs/procedure/CCSPROC/{key}.txt
/graphs/drug/ATC3/{key}.txt
**run /graphcare_/graph_generation/umls_emb_ret.py
outputs:
/data/pj20/exp_data/umls_ent_emb_.pkl
**run /graphcare_/graph_generation/umls_sim_retriever.py
outputs:
/data/pj20/exp_data/ccscm2umls.pkl
/data/pj20/exp_data/ccsproc2umls.pkl
/data/pj20/exp_data/atc32umls.pkl
**run /KG_sampling/umls_sampling.py
outputs:
/graphs/ccscm_umls
/graphs/ccsproc_umls
/graphs/atc3_umls
**run /graphcare_/graph_generation/{cond,proc,drug}_emb_ret.ipynb
outputs:
/graphs/condition/CCSCM
/graphs/procedure/CCSPROC
/graphs/drug/ATC5
id2ent.json ent2id.json id2rel.json rel2id.json
entity_embedding.pkl relation_embedding.pkl
→ required by data_prepare.py
**run /graphcare_/graph_generation/ehr_emb_ret.ipynb → get the clusters
inputs:
path_1 = "../../data/pj20/exp_data/ccscm_ccsproc"
path_1_ = "../../graphs/cond_proc/CCSCM_CCSPROC"
ent2id.json entity_embedding.pkl clusters_th015.json clusters_inv_th015.json
path_2 = "../../data/pj20/exp_data/ccscm_ccsproc_atc3"
path_2_ = "../../graphs/cond_proc_drug/CCSCM_CCSPROC_ATC3"
ent2id.json entity_embedding.pkl clusters_th015.json clusters_inv_th015.json
outputs:
path_1 path_2
ccscm_id2clus.json ccsproc_id2clus.json atc3_id2clus.json
→ required by graphcare.py
Note that we do not need clusters_th015.json???
**run data_prepare.py
output:
sample_dataset_{dataset}_{task}_th015.pkl
graph_{dataset}_{task}_th015.pkl
→ required by graphcare.py as dataset
**run graphcare.py
Could you kindly guide me how to merge the KGs and reproduce the results?
Thanks a lot!