mdanalysis
mdanalysis copied to clipboard
`H5MD` Can not read `observables/<group>/<observable>/...`
Expected behavior
The H5MD standard defines two main ways to define observables. Either directly as group below the root observables or as a group specific observable observables/group.
observables
\-- <observable1>
| \-- step: Integer[variable]
| \-- time: Float[variable]
| \-- value: <type>[variable]
\-- <group1>
| \-- <observable2>
| \-- step: Integer[variable]
| \-- time: Float[variable]
| \-- value: <type>[variable][D][D]
Both should be supported.
Actual behavior
In my experience only the observables after root are supported but observables/group are not. It leads to KeyError: "Unable to synchronously open object (object 'value' doesn't exist)". This seems to be briefly discussed in https://github.com/MDAnalysis/mdanalysis/issues/4320 but I wanted to give more information here.
Code to reproduce the behavior
I've attached a file licl_100.h5.zip because the cobrotoxin.h5md test file only contains observables/lambda but no group specific observables.
import MDAnalysis as mda
u = mda.Universe.empty(100, n_residues=100, atom_resindex=np.arange(100), trajectory=True)
reader = H5MDReader("licl_100.h5", convert_units=False)
u.trajectory = reader
....
Current version of MDAnalysis
- Which version are you using? (run
python -c "import MDAnalysis as mda; print(mda.__version__)") 2.7.0 - Which version of Python (
python -V)? 3.10.9 - Which operating system? Ubuntu 22
@edisj @ljwoods2 is this issue a known limitation in our H5MD implementation?
yes, this is related to point 1) in #4561
basically, /particles can contain arbitrary "atom groups" like /particles/group_1/, /particles/group_2/, etc... that contain the position, velocity, and force data for specific subsets of particles group_1, group_2. If these exist, the /observables group can contain corresponding /observables/group_1/*/value, /observables/group_2/*/value, where * can be whatever you calculate (e.g. total energy, temperature, ...) that corresponds to a calculation over specific atom groups group_1, group_2, ...
At the moment, H5MDReader assumes there is only 1 group in /particles, and all of the observables in /observables correspond to that single particle group
the same thing is happening in #4320
@PythonFZ how would you expect MDAnalysis to support groups, i.e., what exactly should be happening?
I could reproduce with
import MDAnalysis as mda
import numpy as np
u = mda.Universe.empty(100, n_residues=100, atom_resindex=np.arange(100), trajectory=True)
u.load_new("licl_100.h5", format="H5MD", convert_units=False)
and failed with
KeyError: "Unable to synchronously open object (object 'value' doesn't exist)"
If this is an otherwise legal H5MD file then we should be able to at least read the particles information, even if we ignore observables. So this may count as a bug in our H5MD reader and I tentatively assign the defect label.
However, I tried reading licl_100.h5 with the reference implementation (read.py from pyh5md)
python read.py licl_100.h5
Traceback (most recent call last):
File "~/tmp/h5md/read.py", line 17, in <module>
with File(args.file, 'r') as f:
^^^^^^^^^^^^^^^^^^^^
File "~/anaconda3/envs/mda311/lib/python3.11/site-packages/pyh5md/h5md_module.py", line 236, in __init__
raise KeyError("h5md group not found in file")
KeyError: 'h5md group not found in file'
and note that @PythonFZ 's H5MD file cannot be loaded by pyh5md, presumably because it is missing the mandatory h5md root level group (see https://www.nongnu.org/h5md/h5md.html#h5md-root-level ). I don't have time to dissect the h5 file further so could you please comment and demonstrate that the test file does indeed fully conform to the spec?
and note that @PythonFZ 's H5MD file cannot be loaded by pyh5md, presumably because it is missing the mandatory h5md root level group (see https://www.nongnu.org/h5md/h5md.html#h5md-root-level ).
that's interesting... should our implementation also fail immediately if the root h5md group is missing?
yes — it's mandatory
Thanks for looking into this @orbeckst.
This archive md.zip contains two updated files that fulfill the H5MD standard. The obs_root.h5 can be read by MDAanlysis, where the energy is stored in observables/energy. The obs_per_particles.h5 yields the error with the energy being in observables/atoms/energy.
Both files have been created using pyh5md.
how would you expect MDAnalysis to support groups, i.e., what exactly should be happening?
I think it would be beneficial to load the data into Timestep.data just like with the obs_root.h5 file.
Observables in H5MD. Let's assume we have the below:
observables
\-- <observable1>
| \-- step: Integer[variable]
| \-- time: Float[variable]
| \-- value: <type>[variable]
\-- <observable2>
| \-- step: Integer[variable]
| \-- time: Float[variable]
| \-- value: <type>[variable][D]
\-- <group1>
| \-- <observable3>
| \-- step: Integer[variable]
| \-- time: Float[variable]
| \-- value: <type>[variable][D][D]
\-- <observable4>: <type>[]
\-- ...
@PythonFZ, do you expect to always get ts.data['observable1'], ts.data['observable2'], ts.data['group1/observable3'], ts.data['observable4'] for each timestep step at which the observables have been recorded or would you like to specify which observables are pulled out?
What should happen when, say, group1/observable3 was only recorded every 10 steps? We have then timesteps at which there's no value defined for the observable.
as auxiliaries??
We have solved some of these questions with the auxiliary readers already; for instance, auxiliaries can "hold" a value for steps where it wasn't recorded — the details are in Auxiliaries and trajectories.
Perhaps it makes more sense to create an H5MD AuxReader (which may, under the hood, pull data out of the H5MDReader so that we don't open a trajectory twice) instead of trying to drop everything in ts.data.
The docs for auxiliaries are not very clear for users; at least the ones for the EDRReader and trajectories shows you what it looks like in practice and for h5md it might be
u = mda.Universe(topology, "trj.h5md")
aux = mda.auxiliary.H5MD.H5MDObservablesReader(u.trajectory) # can pass reader instance!
u.trajectory.add_auxiliary({"obs1": "obervable1",
"group1_obs3": "group1/observable3"},
"group1/observable99": "group1/observable99"},
aux)
for ts in u.trajectory:
print("observable1 = ", ts.aux.obs1)
print("observable3 (group1) = ", ts.aux.group1_obs3)
print("observable99 (group 1) = ", ts.aux["group1/observable99"]) # only dict
Would this be of interest?
It's also helpful if you write out code that you would have liked to use because that tells us how we should be structuring our user interface.
(Also, there are no promises that anyone is actually going to work on any of this. But if enough people think it would be useful then it becomes more attractive as a project for someone to pick it up.)
I think the flexible timesteps in general are difficult to handle. I'd suggest to do this in two steps:
- Allow MDAnalysis to read files with
observables/particle/<name>/value. This can be achieved via https://github.com/MDAnalysis/mdanalysis/pull/4615 - Consider the original issue of how observables should be handled. How is this handled for e.g. positions and velocities being dumped at different time intervals?
I'm happy to help with this. Currently, I don't have a use case for the observables - I only want to be able to read the file including cell, position, velocities. Do you think auxiliaries fit by looking at the definition of the two groups?
Particles: The size of the stored data scales linearly with the number of particles under consideration
Observables: The size of stored data is typically independent of the system size
Xref: https://github.com/MDAnalysis/mdanalysis/pull/4615#pullrequestreview-2124420584 prototype for just dumping observables of the /group/observable form into ts.data @orbeckst @edisj. Happy to proceed with #4615 as is first while we work on the proper fix.
Update MDAnalysis to the latest version, such as 2.9.0, released on March 11, 2025. This should fix the issue, as a relevant update was merged in July 2024, after your current version. You can check the installation instructions at MDAnalysis User Guide.