TDC icon indicating copy to clipboard operation
TDC copied to clipboard

Rat Liver Microsomal Stability - new dataset

Open iwwwish opened this issue 1 year ago • 1 comments

Describe the problem

Hepatic metabolic stability is a key pharmacokinetic parameter in drug discovery. Metabolic stability is usually assessed in microsomal fractions and only the best compounds progress in the drug discovery process. A high-throughput single time point substrate depletion assay in rat liver microsomes (RLM) is employed at the National Center for Advancing Translational Sciences. In this process, metabolic stability data for 2528 compounds were made public via a PubChem deposition [1]. Furthermore, RLM data for a total of 220 approved drugs that are routinely screened in different drug repurposing projects were also disseminated [2] and this can serve as an independent validation set. Currently, TDC hosts only the CYP450 isoform datasets under the category of metabolism in ADME tasks. Therefore, this dataset is expected to provide the users of TDC with an additional metabolism related dataset that captures metabolism mediated via multiple CYP450 isoforms.

References: [1] and [2]

Describe the solution you'd like

from tdc.single_pred import ADME
data = ADME(name = RLM_NCATS')
split = data.get_split()

df = data.get_approved_set() # independent validation set

Additional context A GCNN model built on a much larger RLM dataset (only a subset was made public) is available here.

iwwwish avatar Jul 31 '23 22:07 iwwwish

Looks great! Is there a pull request that we can review to integrate this? thanks!

kexinhuang12345 avatar Sep 27 '23 03:09 kexinhuang12345

@iwwwish you can look at https://github.com/mims-harvard/TDC/pull/252 for how to add datasets to existing tasks. please let us know if you have any questions.

amva13 avatar Apr 23 '24 13:04 amva13