moleculenet icon indicating copy to clipboard operation
moleculenet copied to clipboard

Filtering False Positives from MoleculeNet datasets

Open rbharath opened this issue 4 years ago • 1 comments

A number of MoleculeNet datasets have obvious false positives attached. For example, the HIV dataset has a number of cytotoxic compounds attached as hits. There are likely a number of PAINs compounds scattered throughout as well. These should be filtered for the next version of MoleculeNet. (H/T @PatWalters for bringing this up).

Let's use this issue to coordinate cleanup efforts and discussion. As a first point, it would probably be useful to keep the v1 MoleculeNet datasets around as records. We could then add a version flag to the dc.molnet.load_X functions that returns either the v2 dataset or the v1 dataset.

rbharath avatar Jun 26 '20 23:06 rbharath

As a useful link, here's @PatWalters blog post about filtering libraries http://practicalcheminformatics.blogspot.com/2018/08/filtering-chemical-libraries.html

rbharath avatar Jun 26 '20 23:06 rbharath