ANI1x_datasets
ANI1x_datasets copied to clipboard
Missing Quadrupole Constants?
Hello!
Thank you so much for making the ANI-1x dataset available, it is a fantastic resource. I have a question regarding the availability of quadrupoles for molecules/conformers in the dataset. According to the paper, the 'wb97x_dz.quadrupole'
key should contain an array of size $N_c \times 6$ where $N_c$ is the number of conformers per molecule. When I look at this array, a significant number of rows were full of nan
. I ran the following code snippet:
ani1x_data = h5py.File('ani-1x/ani1x-release.h5')
frac_quads_li = []
for i in ani1x_data.keys():
all_quads = ani1x_data[i]['wb97x_dz.quadrupole']
all_quads_sub = np.unique(np.argwhere(~np.isnan(all_quads))[:,0])
frac_quads_li.append(float(len(all_quads_sub))/len(all_quads))
print(f'Avg Fraction Computed Quads: {round(np.average(frac_quads_li),3)}')
print(f'No Quad Count: {np.sum(np.array(frac_quads_li)==0.0)}/{len(frac_quads_li)}')
...and got the following result:
Avg Fraction Computed Quads: 0.215
No Quad Count: 1698/3114
So it appears that there are quite a few quadrupoles that are all nan, with more than half of molecules having no quadrupole information. When I run the same analysis on 'wb97x_dz.dipole'
, I found that 181 molecules have no dipole constants available for any conformers. I did not find anything in the publication or GH repo that mentioned these nan values (although I may have missed it). So I am just wondering what happened to the dipoles/quadrupoles in these cases, and whether there is a version of the ANI-1x dataset that contains these dipole/quadrupole values. If not, that is fine. Am happy to recalculate them, or omit the corresponding conformers for the analysis I am trying to do. But if a shareable version is available with these additional values I would appreciate it, as it would save me some time and compute.
Thank you for your time,
Marcus Schwarting