moleculenet icon indicating copy to clipboard operation
moleculenet copied to clipboard

Potential Processing Error in the Original QM8 Dataset on Some Tasks

Open rbharath opened this issue 2 years ago • 0 comments

There is a potential error in the QM8 dataset from the original MoleculeNet paper caused by duplicate columns (possibly due to a pandas data processing error).

https://github.com/deepchem/deepchem/issues/2747

We are still working to verify the error but in the meanwhile there is a fix PR under review that you can use:

https://github.com/deepchem/deepchem/pull/2756

Assuming the error is indeed present, the benchmarking numbers for QM8 may need to be rerun. The duplicated columns are for two very similar tasks though (the two tasks are to predict DFT results on the same molecule computed with the same functional but different basis sets) so I suspect that the qualitative changes will be relatively minimal (models have in effect been double predicting one DFT run instead of two slightly different DFT runs)

rbharath avatar Nov 15 '21 17:11 rbharath