pint-xarray
pint-xarray copied to clipboard
Raise pint.DimensionalityError instead of ValueError
If you try to convert a pint.Quantity
to an incompatible unit you get a pint.DimensionalityError
, whereas if you try to convert a quantified DataArray
to an incompatible unit you get a ValueError
. These should give the same type of error.
In [1]: import pint_xarray
In [2]: import xarray as xr
In [3]: import pint
In [4]: mass_q = pint.Quantity(10, units='kg')
In [5]: mass_q.to('seconds')
---------------------------------------------------------------------------
DimensionalityError Traceback (most recent call last)
<ipython-input-5-86c449341523> in <module>
----> 1 mass_q.to('seconds')
~/Documents/Work/Code/pint/pint/quantity.py in to(self, other, *contexts, **ctx_kwargs)
722 other = to_units_container(other, self._REGISTRY)
723
--> 724 magnitude = self._convert_magnitude_not_inplace(other, *contexts, **ctx_kwargs)
725
726 return self.__class__(magnitude, other)
~/Documents/Work/Code/pint/pint/quantity.py in _convert_magnitude_not_inplace(self, other, *contexts, **ctx_kwargs)
671 return self._REGISTRY.convert(self._magnitude, self._units, other)
672
--> 673 return self._REGISTRY.convert(self._magnitude, self._units, other)
674
675 def _convert_magnitude(self, other, *contexts, **ctx_kwargs):
~/Documents/Work/Code/pint/pint/registry.py in convert(self, value, src, dst, inplace)
1001 return value
1002
-> 1003 return self._convert(value, src, dst, inplace)
1004
1005 def _convert(self, value, src, dst, inplace=False, check_dimensionality=True):
~/Documents/Work/Code/pint/pint/registry.py in _convert(self, value, src, dst, inplace)
1915 value, src = src._magnitude, src._units
1916
-> 1917 return super()._convert(value, src, dst, inplace)
1918
1919 def _get_compatible_units(self, input_units, group_or_system):
~/Documents/Work/Code/pint/pint/registry.py in _convert(self, value, src, dst, inplace)
1516
1517 if not (src_offset_unit or dst_offset_unit):
-> 1518 return super()._convert(value, src, dst, inplace)
1519
1520 src_dim = self._get_dimensionality(src)
~/Documents/Work/Code/pint/pint/registry.py in _convert(self, value, src, dst, inplace, check_dimensionality)
1034 # then the conversion cannot be performed.
1035 if src_dim != dst_dim:
-> 1036 raise DimensionalityError(src, dst, src_dim, dst_dim)
1037
1038 # Here src and dst have only multiplicative units left. Thus we can
DimensionalityError: Cannot convert from 'kilogram' ([mass]) to 'second' ([time])
In [6]: mass_da = xr.DataArray(10).pint.quantify(units='kg')
In [7]: mass_da.pint.to('seconds')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/Documents/Work/Code/pint-xarray/pint_xarray/conversion.py in convert_units(obj, units)
231 try:
--> 232 new_obj = call_on_dataset(
233 convert_units_dataset, obj, name=temporary_name, units=units
~/Documents/Work/Code/pint-xarray/pint_xarray/compat.py in call_on_dataset(func, obj, name, *args, **kwargs)
12
---> 13 result = func(ds, *args, **kwargs)
14
~/Documents/Work/Code/pint-xarray/pint_xarray/conversion.py in convert_units_dataset(obj, units)
216 if failed:
--> 217 raise ValueError(failed)
218
ValueError: {None: DimensionalityError()}
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-7-00816f2a573d> in <module>
----> 1 mass_da.pint.to('seconds')
~/Documents/Work/Code/pint-xarray/pint_xarray/accessors.py in to(self, units, **unit_kwargs)
568 units = either_dict_or_kwargs(units, unit_kwargs, "to")
569
--> 570 return conversion.convert_units(self.da, units)
571
572 def chunk(self, chunks, name_prefix="xarray-", token=None, lock=False):
~/Documents/Work/Code/pint-xarray/pint_xarray/conversion.py in convert_units(obj, units)
238 failed[obj.name] = failed.pop(temporary_name)
239
--> 240 raise ValueError(format_error_message(failed, "convert")) from e
241
242 return new_obj
ValueError: Cannot convert variables:
incompatible units for variable None: Cannot convert from 'kilogram' ([mass]) to 'second' ([time])
Looking at this further this is actually quite tricky. Currently the code will attempt to convert all variables in the xarray object, collect the errors, and then raise a ValueError
with a list of what went wrong for each variable. That's clever, and useful for users to see all the mistakes they made in one go.
However even if only one error was raised, or all the errors that were raised were of the same type, then currently it will still raise a ValueError
regardless if the actual errors were DimensionalityError
s or what.
I tried fixing that by changing convert_units_dataset
to look like this
def convert_units_dataset(obj, units):
converted = {}
failed = {}
for name, var in obj.variables.items():
unit = units.get(name)
try:
converted[name] = convert_units_variable(var, unit)
except (ValueError, pint.errors.PintTypeError) as e:
failed[name] = e
if failed:
exception_types = [type(e) for e in failed.values()]
if len(set(exception_types)) == 1:
# If all failures were of the same type then raise exception of that type
# Ensures simple cases of invalid unit conversion raise a pint.DimensionalityError
raise exception_types[0](failed)
else:
raise ValueError(failed)
return dataset_from_variables(converted, obj._coord_names, obj.attrs)
but that doesn't work because the init signature of DimensionalityError
is different, so it will fail when trying to construct the DimensionalityError
.
Fundamentally trying to combine multiple different errors into one is a bit weird, so what I suggest we do is raise the first error encountered, but with an longer message that can list any later errors found. That should hopefully be the best of both: specific error types for operations on one variable as you would expect, but the user can still see everything they messed up in one try. (And as dict
s retaining insertion order is apparently guaranteed past python 3.7, the same error will definitely be raised everytime the conversion function is run.) What do you think @keewis?
I'm not particularly attached to having it raise ValueError
, but there were a few issues I ran into that could be avoided that way. What I remember is that choosing the right exception for multiple failures gets complicated really quickly, so always raising the same exception type for the same class of errors has its advantages (and I would like to avoid special-casing a single failure vs multiple failures). Maybe it's easier to change the error message to also include the type of the exception? We could also create a special exception that optionally accepts the failed
dict so you can still have the raw exceptions and their tracebacks for debugging?
Note that exceptions raised by the magnitude will not be caught, so those exceptions can shadow a previously failed pint
error.
Fundamentally I just feel quite strongly that errors caused by incorrect units should raise a pint unit-related error. It would be consistent with Quantity
, it allows for catching unit-related errors specifically, and it draws a clearer line for users between "this failed because your code isn't valid" and "your code is valid but this failed because your science equations don't represent what you think they represent".
What I remember is that choosing the right exception for multiple failures gets complicated really quickly, so always raising the same exception type for the same class of errors has its advantages
Is it not fairly simple to just raise the first error encountered? Especially if we know/expect that error is only going to be one of isinstance(err, [pint.Error, TypeError, ValueError])
.
I would like to avoid special-casing a single failure vs multiple failures
True, but "Make a list, raise the first, display any remaining" is logic that is basically the same even for a single error.
Maybe it's easier to change the error message to also include the type of the exception?
We could also create a special exception that optionally accepts the failed dict so you can still have the raw exceptions and their tracebacks for debugging?
Its the actual exception type that I would like to be changed, not the message attached to the ValueError. Also I thought that message already did include the types anyway?
Note that exceptions raised by the magnitude will not be caught, so those exceptions can shadow a previously failed pint error.
I'm not quite sure I understand what you mean by this.
I just saw PEP-654, which seems like it would resolve this (and much better than a single exception class could, if I understand correctly): we'd define a PintExceptionGroup
(or a PintException
that is a ExceptionGoup
) and raise it instead of ValueError
. There's even a backport library so we don't have to wait until we can drop py3.10
.
Edit: maybe we don't even need the ExceptionGroup
, but rather a exception that works like one. The advantage of having it be a ExceptionGroup
is that there is a standardized way to access information, whereas having it be a custom exception would allow us to bypass the new except*
syntax (which I didn't fully understand, so that might not be an issue)
we'd define a PintExceptionGroup
That looks cool! If that allows us to raise both types of error then it seems like it could be a good solution.
There's also PEP 678 which allows attaching additional information to a exception. In our case, that would be the variable name and, in the case of quantify
, the source of the units (attribute or parameter).
I did investigate using that, but because they are scheduled for python=3.11
there are a quite a few projects that don't support those yet, e.g. pytest
, rich
, or ipython
, and thus jupyter
. The result is that tracebacks include less information than right now (even though the information is there). I suppose we could add some special code that detects and mitigates that, but this seems like a lot of work so I'd rather wait for a bit.
What do you think about postponing this feature until there's a bit more support? That would most likely be after the release of python=3.11
, but I guess we don't release that often, anyways.
What do you think about postponing this feature until there's a bit more support?
Totally fine with postponing this!