qca-dataset-submission
qca-dataset-submission copied to clipboard
Add "informative" set (subset) from Ehrman
Jordan Ehrman's analysis of eMolecules pulled out molecules with minimized geometries which are substantially different in different force fields. His work is still finishing a final pass, but I can go ahead and provide a first batch (from the first portion of eMolecules). At present I have 9k compounds. After filtering for < 3 rotatable bonds it's 1,117 compounds with pretty diverse chemistry. I'll put in a PR for this set.
It would be good to run as an optimization dataset without fragmentation for our testing/benchmarking -- though lower priority than the Genentech and Pfizer sets. I'd also like to get one set of drug fragments into the queue ahead of it potentially.