qca-dataset-submission
qca-dataset-submission copied to clipboard
Potential dataset: SureChEMBL
SureChEMBL covers the space of patented molecules from our pharma partners.
It looks like SureChEMBL can be downloaded in SDF or SMILES form: https://disco.chemaxon.com/products/madfast/latest/doc/prepare-molecules.html#tocid-10
The data lives here: ftp://ftp.ebi.ac.uk/pub/databases/chembl/SureChEMBL/data/
A bit more information on this:
Retrieval of SureChEMBL data is described here.
Even more useful is the SureChEMBL map data, which contains SMILES and patent numbers, and is described here. If we have the patent numbers of interest from our partners, we can easily slice out the molecules we want to process.