moleculenet
moleculenet copied to clipboard
Filtering False Positives from MoleculeNet datasets
A number of MoleculeNet datasets have obvious false positives attached. For example, the HIV dataset has a number of cytotoxic compounds attached as hits. There are likely a number of PAINs compounds scattered throughout as well. These should be filtered for the next version of MoleculeNet. (H/T @PatWalters for bringing this up).
Let's use this issue to coordinate cleanup efforts and discussion. As a first point, it would probably be useful to keep the v1 MoleculeNet datasets around as records. We could then add a version
flag to the dc.molnet.load_X
functions that returns either the v2 dataset or the v1 dataset.
As a useful link, here's @PatWalters blog post about filtering libraries http://practicalcheminformatics.blogspot.com/2018/08/filtering-chemical-libraries.html