qca-dataset-submission
qca-dataset-submission copied to clipboard
Potential dataset: Enamine REAL
There is enormous interest in gigadocking and free energy calculations with Enamine REAL, which is a large virtual purchasable library of up to 11 billion molecules at current count.
There are only ~73K in the "building blocks" subset, which might provide good coverage of much of the chemistry in the database.
Other downloadable Enamine REAL subsets can be found on this page.
On closer inspection, it may be a better idea to not use the "building blocks" but instead fragment larger purchaseable compound sets, eliminating duplicates.
If I were picking compound subsets, I'd tackle the following in order:
- Enamine Diversity Discovery Set (which have building blocks assembled in a diverse set of ways): 50K compounds
- Enamine Building Blocks Subset: 124K compounds
- Enamine Hit Locator Library (which have building blocks assembled in a diverse set of ways): 234K compounds