qca-dataset-submission icon indicating copy to clipboard operation
qca-dataset-submission copied to clipboard

Add "informative" set (subset) from Ehrman

Open davidlmobley opened this issue 6 years ago • 0 comments

Jordan Ehrman's analysis of eMolecules pulled out molecules with minimized geometries which are substantially different in different force fields. His work is still finishing a final pass, but I can go ahead and provide a first batch (from the first portion of eMolecules). At present I have 9k compounds. After filtering for < 3 rotatable bonds it's 1,117 compounds with pretty diverse chemistry. I'll put in a PR for this set.

It would be good to run as an optimization dataset without fragmentation for our testing/benchmarking -- though lower priority than the Genentech and Pfizer sets. I'd also like to get one set of drug fragments into the queue ahead of it potentially.

davidlmobley avatar Sep 07 '19 03:09 davidlmobley