TDC icon indicating copy to clipboard operation
TDC copied to clipboard

New DrugComb data

Open TangYiChing opened this issue 3 years ago • 4 comments

Describe the problem The DrugComb database has released new drug combination and monotherapy screening datasets, which includes cancer, malaria, and COVID-19.
Reference: [https://doi.org/10.1093/nar/gkab438]

Describe the solution you'd like Replace current TDC/data/drugcomb.pkl with the new file at (https://drugcomb.org/download/), and add new columns ['Study name', 'Disease'] to distinguish cancer, malaria, or COVID-19.

Additional context N/A.

TangYiChing avatar Jan 06 '23 22:01 TangYiChing

Thank you! It would be a great idea! Would you like to make a PR for it?

kexinhuang12345 avatar Jan 09 '23 05:01 kexinhuang12345

Thank you! It would be a great idea! Would you like to make a PR for it?

DrubComb provides API for quick access to both drug and cell line information. They already have SMILE strings and cell line ids. In terms of adding a new drug-drug-cell line triplet to the current TDC dataset, what needs to be added now is the gene expression values from the CallMiner database. What would you like me to do to facilitate the process?

TangYiChing avatar Jan 10 '23 16:01 TangYiChing

Thank you! Is the gene expression values available only in CallMiner? I saw in the paper they can retrieve them through public databases such as DepMap, Cell Model Passports, etc. https://academic.oup.com/view-large/figure/267020980/gkab438fig1.jpg

kexinhuang12345 avatar Jan 12 '23 05:01 kexinhuang12345

Thank you! Is the gene expression values available only in CallMiner? I saw in the paper they can retrieve them through public databases such as DepMap, Cell Model Passports, etc. https://academic.oup.com/view-large/figure/267020980/gkab438fig1.jpg

Yes, these are commonly used sources nowadays, and they are all RNA-seq data now (i.e., expression values are TPM). We might need a new workflow for data processing.

TangYiChing avatar Jan 12 '23 19:01 TangYiChing