Make create_basic_dataset reuse the CMILES from an OptimizationDataset (if present) to avoid issues where cheminformatics toolkit behavior has changed
As Lily pointed out here:
there's no guarantee that running a to_smiles will give you the same cmiles for the same molecule across different OpenEye/RDKit versions, or especially if you have one toolkit installed but another was used to generate the source datasets. It would be slightly more robust to use the exact cmiles in the dataset result
Basically, the new version of create_basic_dataset (and likely other parts of qcsubmit) reconstructs a CMILES for each Molecule instead of reusing the CMILES on the OptimizationRecord. I don't think this causes any (known) functional issues, but it could lead to situations where two "identical" datasets have different CMILES.
This call to to_smiles is the root of the issue in this case, and could be replaced by some kind of dict lookup mapping record_id to cmiles extracted from self.entries earlier in the function.
https://github.com/openforcefield/openff-qcsubmit/blob/d4e6b6986a58f5cf0184ba14dc4f7419e9978b67/openff/qcsubmit/results/results.py#L706-L715