pyiron_base
pyiron_base copied to clipboard
UnitConverter - usage
Hi,
I really like the idea of having a generic conversion object to be able to work with and store data in units that make sense to the user. However, I am slightly confused in how this class (and its decorators) can be used for a specific pyiron_atomistics package. Assume I want to work in atomic units (which is one of the possible systems of the pint library), how are the units stored in the HDF file, and can I still use the generic atomistics functions (such as animate_structure, which requires the positions to be defined in angstrom)?
Thanks in advance,
All the best, Sander
Hi @SanderBorgmans ,
It is currently still more work in progress, but yes the goal is that you can specify the units in your ~/.pyiron
configuration and then they are used everywhere. @sudarsan-surendralal Can you update us what is the latest status of this module?
Best,
Jan
Thanks for bringing this up. Like @jan-janssen said, this is a feature which is work in progress. Over the weekend, I'll come up with a demo notebook to illustrate how we plan to use this which I think would be helpful.
@jan-janssen @sudarsan-surendralal Sounds great! I've done some experimenting with the existing classes, and thought that the UnitConvertor seemed a bit redundant when pint already implements many different unit systems (and provides the conversion factors through q.to_base_units()). The main drawback of this is likely that the 'pyiron' unit conventions are not yet implemented as a system (but could easily be added by defining a Systems object). This would allow the user the select the unit system, whereas the code units can simply be implemented on a plugin basis.
def code_to_base_pint(self, quantity):
"""
Get the conversion factor as a `pint` quantity from code to base units
Args:
quantity (str): Name of quantity
Returns:
pint.quantity.Quantity: Conversion factor as a `pint` quantity
"""
return (1 * self._code_registry[quantity]).to_base_units()
def base_to_code_pint(self, quantity):
"""
Get the conversion factor as a `pint` quantity from base to code units
Args:
quantity (str): Name of quantity
Returns:
pint.quantity.Quantity: Conversion factor as a `pint` quantity
"""
return 1./((1 * self._code_registry[quantity]).to_base_units())
Thanks for the feedback @SanderBorgmans. If I understand correctly, to_base_units
can only convert to the unit systems recognized by pint (unless there is a way to add our own custom unit systems with Angstrom, eV, etc.). I think this is where our new implementation could work. Below is an example on how I intended to use the new classes and a comparison with pint
import pint
from pyiron_base.generic.units import PyironUnitRegistry, UnitConverter
ureg = pint.UnitRegistry(system='atomic')
import numpy as np
# for unit_sys in dir(ureg.sys):
# ureg.default_system = unit_sys
# print((5 * ureg.angstrom).to_base_units())
Pint conversion
distance = 5 * ureg.angstrom
distance
5 angstrom
distance.to_base_units()
9.448630623111397 bohr
distance = np.ones(3) * 5 * ureg.angstrom
print(distance.to_base_units())
[9.448630623111397 9.448630623111397 9.448630623111397] bohr
Advantages
- Automatic conversion (no need for any implementation)
Disadvantages
- Only between recognized system of units (SI, CGS, Atomic). Otherwise we have to define our custom system of units
pyiron's way (what we proposed)
base_registry = PyironUnitRegistry()
base_registry.add_quantity(quantity="distance", unit=ureg.angstrom, data_type=float)
base_registry.add_quantity(quantity="energy", unit=ureg.eV, data_type=float)
base_registry.add_quantity(quantity="volume", unit=ureg.angstrom ** 3, data_type=float)
code_registry = PyironUnitRegistry()
code_registry.add_quantity(quantity="distance", unit=ureg.bohr, data_type=float)
code_registry.add_quantity(quantity="energy", unit=ureg.hartree, data_type=float)
code_registry.add_quantity(quantity="volume", unit=ureg.bohr ** 3, data_type=float)
unit_converter = UnitConverter(base_registry=base_registry, code_registry=code_registry)
@unit_converter.base_units(quantity="distance")
def return_ones_ang():
return np.ones(5)
@unit_converter.code_units(quantity="distance")
def return_ones_bohr():
return np.ones(5)
print(return_ones_ang())
print(return_ones_bohr())
[1.0 1.0 1.0 1.0 1.0] angstrom
[1.0 1.0 1.0 1.0 1.0] bohr
@unit_converter.code_to_base(quantity="distance")
def return_ones_ang():
return np.ones(5)
@unit_converter.base_to_code(quantity="distance")
def return_ones_bohr():
return np.ones(5)
print(return_ones_ang())
print(return_ones_bohr())
[0.52917721 0.52917721 0.52917721 0.52917721 0.52917721]
[1.88972612 1.88972612 1.88972612 1.88972612 1.88972612]
unit_converter.base_to_code_value(quantity="distance")
1.8897261246222794
Hi @sudarsan-surendralal,
Thanks for the example. Pint does allow for specifying custom base unit systems, which is why I thought it would be best to work with a single pyiron registry as a kind of wrapper around the pint registry.
At the moment I'm just not quite clear on how the unit systems interacts with the pyiron database. Will the pint quantities be stored in the pyiron input/output instead of normal float/numpy arrays or will the unit registries convert before/after storing (manually in the notebook by wrapping every variable or automatically by creating aliases).
For sure the registry containing the code units can be written as a module for every pyiron plugin. In that case the specific h5 numbers are always well defined if you look to the source code of your pyiron version (or they could be written somewhere to the h5 file).
The issue is that I need to know where to place the wrappers. If you recall for instance job['output/generic/energy_tot'] do you already want it to be in your base units (such that the converted numbers need to be stored in the h5, and we just place the wrappers in write_input/collect_output) or do you manually convert it each time, or should each job['path'] be an alias for a wrapped equivalent?
You could for instance add a unit_system attribute to each project or job that performs this conversion automatically (where we can use the pint systems or a custom pyiron system, which shouldn't be too difficult to define) storing only the converted numbers in the h5, such that we don't need to store units for each dataset individually.
What do you think?
Our current approach aims to store all results as plain numbers /numpy arrays in the pyiron units in the hdf. Ideally, we would enrich the output of job.output.quantity
by the correct unit. This is not yet done.
In the next step, one could define a desired unit system and always convert accordingly.