mdanalysis
mdanalysis copied to clipboard
Long term solution for units in AuxReaders and in general
Is your feature request related to a problem?
Some discussions on #3749 brought up conversion of units in the AuxReaders. Our current solution used in the coordinate readers is to hardcode the units for the relevant format in and then convert while required.
While this may be an OK solution for the coordinates where the units are relatively simple, the more exotic units that are possible to read in an EDR file for example make this process more complicated. This is a general problem for all proposed AuxReaders, as they are generally aimed at providing an interface to those "more exotic" pieces of information obtained from a simulation.
Additionally this solution doesn't scale well to packages like LAMMPs where you can use up to 8 different unit systems. We will probably need consider if some kind of units package is the best way to deal with this long term.
Use of unyt or pint where raised in the related issue #2230
Personally I think we will need to move to a units management system at some point.
We'd also need to decide if more exotic units should be converted or not. Some of the units that can occur in EDR files are
- kJ/mol
- K
- bar
- unitless
- nm
- nm ^ 3
- kg / m^3
- bar * nm
- nm / ps
- D
and a few of these are not yet mentioned in our docs.
What do we want to do with pressures and dipole moments, for example?
Most should be derivable from our base units, eg pressure is Force/Area so should be kJ/molA^3? And dipole moment eA. But I agree this is a bit of a challenge and we should enhance docs whatever we decide.
I am going to add this as a discussion point for 3.0 ? If we are going to change to a units package 3.0 would be a good time to do it.
Xarray just transitioned to using pint. Check out their blog on the subject.
Compared to my comment in the other thread - long enough ago I forget having written it - I'd recommend Pint as the one solution to reach for. unyt is not abandoned or anything but its maintenance has shifted from the original author to a team of other engineers in their organization. More importantly, its adoption is lower than Pint's, which I think has exceeded critical mass, has integration with tons of other packages, etc.
Something else to consider is that Pint makes it easy to create your own unit registry, if you need exotic stuff not explicitly included in the default pint.UnitRegistry. OpenFF has its own units package which is basically a slightly-custom defaults.txt with a little bit of machinery to expose it at the module level. This makes it easy do add shorthands like the various molar kJ/cal units, etc. The major downside of this approach is that unit registries are not directly compatible with each other (so I think an extra conversion layer would be needed):
>>> from openff.units import unit as unit1
>>> import pint
>>> unit2 = pint.UnitRegistry()
>>> 1.0 * unit1.meter + 1.0 * unit2.meter
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mattthompson/miniconda3/envs/openff-interchange-env/lib/python3.9/site-packages/pint/quantity.py", line 1154, in __add__
return self._add_sub(other, operator.add)
File "/Users/mattthompson/miniconda3/envs/openff-interchange-env/lib/python3.9/site-packages/pint/quantity.py", line 139, in wrapped
return f(self, *args, **kwargs)
File "/Users/mattthompson/miniconda3/envs/openff-interchange-env/lib/python3.9/site-packages/pint/quantity.py", line 1032, in _add_sub
if not self._check(other):
File "/Users/mattthompson/miniconda3/envs/openff-interchange-env/lib/python3.9/site-packages/pint/util.py", line 846, in _check
raise ValueError(
ValueError: Cannot operate with Quantity and Quantity of different registries.
I'm a bit curious, though, is the proposal to change most uses of numpy.ndarray to pint.Quantity? That'd be a pretty significant change. If so, some other things that come to mind are
- Some amount of NEP 18 is not implemented in Pint.
numpy.linalg.normcomes to mind as one. I don't have a great sense of what is and isn't implemented - most of the basic stuff I do is supported. - Some stuff can be slow. I haven't yet figured out why my code is slow, so I can't be much more helpful than that, but when I profile stuff I write that makes heavy use of
Quantityobjects I find a lot of time is spent doing conversions. I'm probably doing some stuff the wrong way - but performance would something to pay attention to while working on an implementation.
If I'm assuming incorrectly and the idea is to use Pint or something when doing I/O or conversions but keep the core data as numpy.ndarrays with implicit units ... then I guess disregard half of this.
Thanks @mattwthompson!
Super useful info and good to get a perspective from an ecosystem that uses pint regularly.
Interesting comments about speed, given what you have said I imagine we won't switch to pint.Quantity but will just use for I/O, conversions and tagging objects with units.
Yeah, so I think in discussions with @orbeckst @richardjgowers and @lilyminium we were also quite wary of the extra costs, I think there was maybe some rough idea where we would keep .positions as-is but offer a new pint-ed attribute or something along those lines? (someone might remember that conversation better)
My objections lay mainly along having to contort code to fit operations that pint doesn't support, similar to what @mattwthompson's mentioned. Given that we return a copy of a numpy array every time we request .positions speed hasn't been my main concern, especially since you can access the underlying _/magnitude reasonably easily.
xarray has gotten around some of this with their numpy support (np.linalg.norm strips units but does work), so just using their data types could be a cool direction to look into.