qca-dataset-submission icon indicating copy to clipboard operation
qca-dataset-submission copied to clipboard

Potential dataset: SureChEMBL

Open jchodera opened this issue 6 years ago • 2 comments

SureChEMBL covers the space of patented molecules from our pharma partners.

It looks like SureChEMBL can be downloaded in SDF or SMILES form: https://disco.chemaxon.com/products/madfast/latest/doc/prepare-molecules.html#tocid-10

The data lives here: ftp://ftp.ebi.ac.uk/pub/databases/chembl/SureChEMBL/data/

jchodera avatar Jul 02 '19 23:07 jchodera

A bit more information on this:

Retrieval of SureChEMBL data is described here.

Even more useful is the SureChEMBL map data, which contains SMILES and patent numbers, and is described here. If we have the patent numbers of interest from our partners, we can easily slice out the molecules we want to process.

jchodera avatar Jul 05 '19 18:07 jchodera