qca-dataset-submission icon indicating copy to clipboard operation
qca-dataset-submission copied to clipboard

Data generation and submission scripts for the QCArchive ecosystem.

Results 75 qca-dataset-submission issues
Sort by recently updated
recently updated
newest added

The [PDB Ligand Expo](http://ligand-expo.rcsb.org/) contains ~30K small molecules that appear in the PDB. This would be a good set to ensure we have adequate coverage of chemical space, since these...

suggested dataset

[BindingDB](https://www.bindingdb.org/bind/ByPatent.jsp) contains a way to query molecule sets via patents the data was curated from, with a field populated with the name of the filing organization: https://www.bindingdb.org/bind/ByPatent.jsp If we can...

suggested dataset

We should migrate the Roche fragment set torsion drive input scripts here from wherever they are.

Here's some chemistry covered by MMFF94 and GAFF/GAFF2 which we do not currently cover, and which is not currently represented in our datasets: - `[#8]~[#35]`: O-Br single bonds are present...

reviewed-2025

We should fall back to the RDKit conformer generator (specifically, ETKDG), the experimental torsion knowledge database) in cases where Omega fails in generating initial conformers for torsion drives and valence...

The "OpenFF Group1 Torsions" dataset was submitted a while ago. There were some error jobs caused by difficulties in full 360 torsion scans. @dgasmith and I tried to fix the...

[This link](https://disco.chemaxon.com/products/madfast/latest/doc/prepare-molecules.html) has a number of good (but large) open molecule datasets we might consider in the future.

suggested dataset

[SureChEMBL](https://www.surechembl.org) covers the space of patented molecules from our pharma partners. It looks like SureChEMBL can be downloaded in SDF or SMILES form: https://disco.chemaxon.com/products/madfast/latest/doc/prepare-molecules.html#tocid-10 The data lives here: ftp://ftp.ebi.ac.uk/pub/databases/chembl/SureChEMBL/data/

suggested dataset
reviewed-2025

This issue is to keep track of chemistries that break common workflows, what their failures are, and if they were resolved. ## Segfaults #### Failure at `oequacpac.OEEnumerateFormalCharges()` (used to expand...

The MLPepper RECP fragment dataset for ML partial charge training, see the readme for more details. ## New Submission Checklist - [x] Created a new folder in the submissions directory...