QCFractal
QCFractal copied to clipboard
Wrong typecasting in records
Describe the bug
As I mentioned in the meeting the other day, I came across what I think is a few bugs in the records for the following single point datasets on the ml server for the spice datasets. It seems to specifically be impacting "spec_6" data, for the following properties:
current energy <class 'str'> dispersion correction energy <class 'str'> 2-body dispersion correction energy <class 'str'> b3lyp-d3(bj) dispersion correction energy <class 'str'>
For this dataset, it appears those 4 properties all store the same energy (and it is identical to 'return_energy' which is properly typed as a float). I'll note the lists of value (e.g., the fields related to gradients) are constructed correctly of floats.
The following datasets have this issue for spec_6
SPICE Solvated Amino Acids Single Points Dataset v1.0 spec_6 SPICE DES Monomers Single Points Dataset v1.0 spec_6 SPICE PubChem Set 1 Single Points Dataset v1.0 spec_6 SPICE Dipeptides Single Points Dataset v1.0 spec_6 SPICE PubChem Set 2 Single Points Dataset v1.0 spec_6 SPICE PubChem Set 3 Single Points Dataset v1.0 spec_6 SPICE PubChem Set 5 Single Points Dataset v1.0 spec_6 SPICE PubChem Set 6 Single Points Dataset v1.0 spec_6 SPICE PubChem Set 1 Single Points Dataset v1.1 spec_6 SPICE DES Monomers Single Points Dataset v1.1 spec_6 SPICE Dipeptides Single Points Dataset v1.1 spec_6 SPICE Pubchem Set 4 Single Points Dataset v1.0 spec_6 SPICE Solvated Amino Acids Single Points Dataset v1.1 spec_6 SPICE DES370K Single Points Dataset v1.0 spec_6 SPICE PubChem Set 1 Single Points Dataset v1.2 spec_6 SPICE Dipeptides Single Points Dataset v1.2 spec_6 SPICE DES370K Single Points Dataset Supplement v1.0 spec_6 SPICE PubChem Set 2 Single Points Dataset v1.2 spec_6 SPICE PubChem Set 3 Single Points Dataset v1.2 spec_6 SPICE Pubchem Set 4 Single Points Dataset v1.2 spec_6 SPICE PubChem Set 5 Single Points Dataset v1.2 spec_6 SPICE Ion Pairs Single Points Dataset v1.0 spec_6 SPICE PubChem Set 6 Single Points Dataset v1.2 spec_6 SPICE Ion Pairs Single Points Dataset v1.1 spec_6
To Reproduce
Just a quick code to loop over everything.
from qcportal import PortalClient
client = PortalClient("ml.qcarchive.molssi.org")
dataset_type = "singlepoint"
datasets = client.list_datasets()
datasets_to_consider = []
for dataset in datasets:
if dataset['dataset_type'] == 'singlepoint':
if 'SPICE' in dataset['dataset_name']:
datasets_to_consider.append(dataset['dataset_name'])
spec = 'spec_6'
for dataset_name in datasets_to_consider:
ds = client.get_dataset(
dataset_type=dataset_type, dataset_name=dataset_names[0]
)
entry_names = ds.entry_names
max_val = 1
for record in ds.iterate_records(entry_names[0:max_val], specification_names=[spec]):
has_strings = False
for k in record[2].dict()['properties'].keys():
if isinstance(record[2].dict()['properties'][k], str):
has_strings = True
#print(k, type(record[2].dict()['properties'][k]))
if has_strings:
print(f'{dataset_name} {spec}')
This seems to apply only to DFTD3 calculations, where the values are converted to strings: https://github.com/MolSSI/QCEngine/blob/1b27a14255817f13092ae846593b0fb7c975625b/qcengine/programs/dftd3.py#L273C41-L273C41
@loriab is looking to clean that up in qcengine soon. I can convert the existing values in the database next week.
(The DFTD3 calculations come from specifying b3lyp-d3 calculations. In the legacy version, this caused two separate records/specifications to be created - one for b3lyp and one for the d3 correction. The new version makes these existing records explicit, but no longer does the splitting for new calculations. It's a bit complicated...)