`e3fp` for diatomic molecules
Thanks for making this nice tool for the community.
I got problems with computing the e3fp fingerprints for diatomic molecules, such as H2, O2 and CO. Here is the corresponding error information
from e3fp.pipeline import confs_from_smiles, fprints_from_mol
# configurations
confgen_params = {"max_energy_diff": 20.0, "first": 3}
fprint_params = {"bits": 4096, "radius_multiplier": 1.5, "rdkit_invariants": True}
# build molecular conformer
mol = confs_from_smiles("[HH]", "h2_gas", confgen_params=confgen_params)
# compute the fingerprint
fprints = fprints_from_mol(mol, fprint_params=fprint_params)
RDKit WARNING: [19:42:12] WARNING: not removing hydrogen atom without neighbors
2021-08-16 19:42:12,640|INFO|Generating conformers for h2_gas.
2021-08-16 19:42:12,662|INFO|Generated 1 conformers for h2_gas.
2021-08-16 19:42:12,664|INFO|Generating fingerprints for h2_gas.
2021-08-16 19:42:12,666|ERROR|Error generating fingerprints for h2_gas.
Traceback (most recent call last):
File "/home/legend/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/fingerprint/generate.py", line 188, in fprints_dict_from_mol
fingerprinter.run(conf, mol)
File "/home/legend/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/fingerprint/fprinter.py", line 181, in run
self.initialize_conformer(conf)
File "/home/legend/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/fingerprint/fprinter.py", line 262, in initialize_conformer
bound_atoms_dict=self.bound_atoms_dict,
File "/home/legend/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/fingerprint/fprinter.py", line 547, in __init__
self.distance_matrix = array_ops.make_distance_matrix(atom_coords)
File "/home/legend/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/fingerprint/array_ops.py", line 57, in make_distance_matrix
return squareform(pdist(coords))
File "/home/legend/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/scipy/spatial/distance.py", line 2018, in pdist
raise ValueError('A 2-dimensional array must be passed.')
ValueError: A 2-dimensional array must be passed.
-------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-99e54f4a484f> in <module>
8 mol = confs_from_smiles("[HH]", "h2_gas", confgen_params=confgen_params)
9 # compute the fingerprint
---> 10 fprints = fprints_from_mol(mol, fprint_params=fprint_params)
~/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/pipeline.py in fprints_from_mol(mol, fprint_params, save)
57 fprints_dict = fprints_dict_from_mol(mol, save=save, **fprint_params)
58 level = fprint_params.get("level", -1)
---> 59 fprints_list = fprints_from_fprints_dict(fprints_dict, level=level)
60 return fprints_list
61
~/softs/miniconda3/envs/chem_py37/lib/python3.7/site-packages/e3fp/pipeline.py in fprints_from_fprints_dict(fprints_dict, level)
48 """Get fingerprint at `level` from dict of level to fingerprint."""
49 fprints_list = fprints_dict.get(
---> 50 level, fprints_dict[max(fprints_dict.keys())]
51 )
52 return fprints_list
ValueError: max() arg is an empty sequence
Do we have a fix for this? Thank you!
Hi, I'm very sorry for the late reply to this issue. I was only partially able to reproduce the error:
>>> import e3fp.
>>> from e3fp.pipeline import fprints_from_mol, confs_from_smiles
>>> smiles_dict = {"h2": "[HH]", "o2": "O=O", "co": "[C-]#[O+]"}
>>> confgen_params = {'max_energy_diff': 20.0, 'first': 3}
>>> fprint_params = {"bits": 4096, "radius_multiplier": 1.5, "rdkit_invariants": True}
>>> mol = confs_from_smiles(smiles_dict["o2"], "o2", confgen_params=confgen_params)
2022-06-02 02:24:11,639|INFO|Generating conformers for o2.
2022-06-02 02:24:11,648|INFO|Generated 1 conformers for o2.
>>> fprints = fprints_from_mol(mol, fprint_params=fprint_params)
2022-06-02 02:24:19,635|INFO|Generating fingerprints for o2.
2022-06-02 02:24:19,640|INFO|Generated 1 fingerprints for o2.
>>> mol = confs_from_smiles(smiles_dict["co"], "co", confgen_params=confgen_params)
2022-06-02 02:24:29,416|INFO|Generating conformers for co.
2022-06-02 02:24:29,422|INFO|Generated 1 conformers for co.
>>> fprints = fprints_from_mol(mol, fprint_params=fprint_params)
2022-06-02 02:24:31,869|INFO|Generating fingerprints for co.
2022-06-02 02:24:31,873|INFO|Generated 1 fingerprints for co.
>>> mol = confs_from_smiles(smiles_dict["h2"], "h2", confgen_params=confgen_params)
[02:24:41] WARNING: not removing hydrogen atom without neighbors
2022-06-02 02:24:41,237|INFO|Generating conformers for h2.
2022-06-02 02:24:41,244|INFO|Generated 1 conformers for h2.
>>> fprints = fprints_from_mol(mol, fprint_params=fprint_params)
2022-06-02 02:24:42,228|INFO|Generating fingerprints for h2.
2022-06-02 02:24:42,229|ERROR|Error generating fingerprints for h2.
Traceback (most recent call last):
...
i.e. I had no issues fingerprinting O2 and CO, just H2.
In general, diatomic molecules should be supported by e3fp. If I had to guess, H2 fails because we never use atomic coordinates of hydrogens for fingerprinting. But for a molecule that is pure hydrogen (i.e. just this molecule and protons), this would of course cause fingerprinting to fail. Here we could either
- Explicitly add and use hydrogens, or
- The fingerprint should have no "on" bits. While the latter seems preferable for consistency, if it produces non-unit fingerprint metrics between fingerprints for 2 hydrogen molecules (need to check), I think this would not be ideal. @mjke what do you think?