pint icon indicating copy to clipboard operation
pint copied to clipboard

Better annotations support

Open hgrecco opened this issue 3 years ago • 28 comments

With PEP560 we could now try to have a better annotations experience for Pint. Briefly, my proposal would be to do something like this

class Model:

    value: Quantity['m/s']

or

class Model:

    value: Quantity['[length]/[time]']

and the provide to a nice API to check for this.

What do you think?

hgrecco avatar Aug 29 '20 12:08 hgrecco

and also these examples would be:

>>> @ureg.awrap
... def mypp(length: Quantity['meter]') -> Quantity['second]':
...     return pendulum_period(length)

and

>>> @ureg.acheck
... def pendulum_period(length: Quantity['[length]'):
...     return 2*math.pi*math.sqrt(length/G)

where awrapand acheck is are annotated equivalents to wrap and check.

hgrecco avatar Aug 30 '20 22:08 hgrecco

It would also be fantastic to have Mypy support for checking these annotations statically! Happy to contribute where I can.

jmuhlich avatar Sep 01 '20 21:09 jmuhlich

I haven't started using annotations in MetPy (yet), so I don't have any practical experience to rely on to see any obvious gotchas. In general, though, those look reasonable.

dopplershift avatar Sep 02 '20 05:09 dopplershift

I was playing with this concept. Some things to discuss:

  1. Can annotations be done with units (e.g. m/s) and dimensions (e.g. [length]/[seconds])? Yes. As there are valid use cases for both (e.g. wrapping vs checking)
  2. What is the output type of Quantity['m/s']?
  • A str? No.
  • A Quantity?
  • A new class (e.g. TypedQuantity)?
  • A UnitContainer or similar?
  1. Can we annotate with ureg.meter?

hgrecco avatar Dec 28 '20 14:12 hgrecco

It would be nice to set the expected type magnitude should return either np.array, float or any supported types. It is sometimes confusing for the user to guess it when it is simply typed with Quantity.

Like collection types are doing like List[float] or Tuple[str, int], you then know what's inside.

jules-ch avatar Dec 28 '20 14:12 jules-ch

Hello, I am writing a webapp for designing rural water supplies and I made extensive use of both pint and mypy. I would therefore be glad to contribute exposing Quantity to mypy.

My objective is to write code like follows:

class WaterPipeline:
    @property`
    def get_pipeline_pathlength(self): -> Quantity['length']
        """"""

In my case annotations should be done using dimensions: in the example above it is important to check that a Quantity['length'] is returned, but such length may be expressed in meters or kilometers.

I also agree with @jules-ch comment above about the expected type of magnitude.

claudiofinizio avatar Dec 29 '20 11:12 claudiofinizio

@hgrecco NumPy has also been adding annotation support for ndarray inputs, so it would be important IMO to make sure whatever is done here compatible/sensible with that.

dopplershift avatar Dec 29 '20 20:12 dopplershift

How about something like

  • Quantity: any quantity
  • Quantity[t]: a quantity with t magnitude. eg. float, int, ndarray (or list)
  • Quantity[s] with s a string: a quantity with units (or dimensions) given by s. eg. m/s, [length]/[time]
  • Quantity[t, s]

But I would be more worried how to handle this.

  • What is the output type?
  • Which convenience functions or methods do we add to make this useful?
  • How do we explain this?

hgrecco avatar Dec 29 '20 21:12 hgrecco

type annotation of the magnitude should be the first thing we should target since Quantity type is a container just like List Tuple. Second should be unit or dimension. Just like you said @hgrecco, so something like

Quantity [type, unit] Quantity [type, dimension]

Type annotation for mypy usage, and we can have dimensions or unit telling the user which unit or dimension expect at first. And we can go further with checking unit or dimension at runtime.

jules-ch avatar Dec 29 '20 21:12 jules-ch

For some internal projects, I have tried three different approaches to annotations for the output of Something[args] (where something is a class):

  1. an instance of another class. This is what python 3.9 does for containers. e.g. list[str] returns GenericAlias(list, str). Two options branch here: (1a) use GenericAlias or (1b) create a new class with extra methods.
  2. an instance of TypedSomething which is a subclass of Something and args are stored as instance variables.
  3. a new class (a different for every arg)

I would discourage (3) in pint but not so sure about the other two. Option 1a is the simple way to go but no so ergonomic. Option 1b is better, because new methods could be added to test for equivalence between annotations or if a given quantity satisfy an annotation.

Option 2 would allow for things like the following:

ScalarVelocityQ = Quantity[float, '[speed]']
q1 = ScalarVelocityQ(3, 'm/s')
q2 = ScalarVelocityQ(3, 's') # Exception is raised

In any case, I think we need to add good annotation introspection capability because we want to be able to evolve this without breaking everything. We need to avoid having to provide something like this https://stackoverflow.com/a/52664522/482819

hgrecco avatar Dec 29 '20 22:12 hgrecco

We could take a look at https://docs.python.org/3/library/typing.html#typing.Annotated which describe what we want to achieve I think.

jules-ch avatar Jan 04 '21 13:01 jules-ch

type annotation of the magnitude should be the fist thing we should target since Quantity type is a container just like List Tuple. Second should be unit or dimension. Just like you said @hgrecco, so something like

Quantity [type, unit] Quantity [type, dimension]

Type annotation for mypy usage, and we can have dimensions or unit telling the user which unit or dimension expect at first. And we can go further with checking unit or dimension at runtime.

Referring @jules-ch, in my opinion Quantity is not just a container. My perception: if I read somebody's code, I would like first to see if the return value of a function represents, say, a length, or energy, or pressure or whatever. Only after I would be interested to understand if that energy is, say, integer, float or some numpy type. Or at least, this is the way I see when "you first glance at somebody's code"

In short I think Quantity[dimension] should be the first info somebody looks for. Accordingly, I think "option 2" proposed by @hgrecco: ScalarVelocityQ = Quantity[float, '[speed]'] seems me the best approach.

claudiofinizio avatar Jan 05 '21 19:01 claudiofinizio

Just as a note, not sure how relevant it is to this issue: I tried to add type annotations to the python-measurement library a while ago, hoping that I could write something like l: Length = Length(2, "m") / 5 or v: Speed = Length(2, "m") / Time(1.5, "s") if there is an appropriate @overload annotation for Length.__div__. However, as I briefly summarized in https://github.com/coddingtonbear/python-measurement/issues/43#issuecomment-619821850 (enum item (3)) and also discussed in https://github.com/python/mypy/issues/4985#issuecomment-616979469, annotations for operators like __mul__ and __div__ are a bit trickier than for ordinary methods, because the resulting type of a * b is not only determined by the left operand's __mul__ method, but could also come from the right operand's __rmul__ method. As I wrote above, I'm not sure how relevant this is for annotating the Pint module, but you may hit this at some point, so I just wanted to leave a note here.

tgpfeiffer avatar Jan 06 '21 00:01 tgpfeiffer

There are multiple use cases that we should address:

  • Static type analysis with mypy (magnitude type falls under this)
    • Quantity should be a Generic.
  • Documentation (which dimension or unit to expect)
    • it's difficult (see @tgpfeiffer comment) to do static analysis with this info.
  • Runtime check which is related to @hgrecco comment.
    ScalarVelocityQ = Quantity[float, '[speed]']
    q1 = ScalarVelocityQ(3, 'm/s')
    q2 = ScalarVelocityQ(3, 's') # Exception is raised
    

IMO the best option is :

Make Quantity Generic & use utilities class to return Annotated Types with PEP593 with Metadata that can be used for runtime checks.

  
T = TypeVar("T")
class Quantity(Generic[T],QuantityGeneric, PrettyIPython, SharedRegistryObject):
  ...
  
    @property
    def magnitude(self) -> T:
        """Quantity's magnitude. Long form for `m`"""
        return self._magnitude
  ...
    def __iter__(self) -> Iterator[T]:
  ...
    def to(self, other=None, *contexts, **ctx_kwargs) -> "Quantity[T]":

I tried something like this :


from typing import _tp_cache, _type_check
from typing import _AnnotatedAlias


class QuantityAlias(_AnnotatedAlias, _root=True):
    def __call__(self, *args, **kwargs):
        quantity = super().__call__(*args, **kwargs)
        
        if self.__metadata__:
            dim = quantity._REGISTRY.get_dimensionality(self.__metadata__[0])
            if not quantity.check(dim):
                raise TypeError("Dimensionality not matched")

        return quantity


class TypedQuantity:
    @_tp_cache
    def __class_getitem__(cls, params):
        from pint.quantity import Quantity
        msg = "TypedQuantity[t, ...]: t must be a type."
        origin = _type_check(Quantity[params[0]], msg)
        metadata = tuple(params[1:])
        return QuantityAlias(origin, metadata)

Here we make a simple check at runtime for dimension just like @hgrecco example.

So TypedQuantity[float, "[length]"] will be translated to Annotated[Quantity[float], "length"]

We could go further like it is done here https://docs.python.org/3/library/typing.html#typing.Annotated.

We could translate to something like Annotated[Quantity[float], DimensionCheck("length")].

Those metadata can be added to the instance if needed.

I'll try to Draft a PR.

jules-ch avatar Jan 06 '21 22:01 jules-ch

@jules-ch I really like your proposal. I am eager to see the draft PR. Great discussion everybody!

hgrecco avatar Jan 08 '21 22:01 hgrecco

I would like to make a plug within my company's software team to use pint for units. Having typing is a huge plus.

I see https://github.com/hgrecco/pint/pull/1259 was merged, is that the only PR needed for typing, or is there more work to be done? When do you think a release will be cut that incorporates that PR?

jamesbraza avatar Aug 10 '21 22:08 jamesbraza

We'll make 0.18 release soon, prob end of the month.

pint typing support will be experimental at first, I still need to document it. I'll push for a new version of documentation, just haven't got the time lately.

jules-ch avatar Aug 14 '21 21:08 jules-ch

Hi. I'm currently experimenting with the new typing features in v0.18 (#1259). How would I annotate functions or classes that handle float / np.ndarray equivalently to Quantity[float] / Quantity[np.ndarray]. For example, how would I annotate the following generic function correctly:

from typing import TypeVar
import numpy as np
from pint import Quantity

A = TypeVar('A', np.ndarray, Quantity[np.ndarray])

def get_index(array: A, i: int) -> ???:
    return array[i]

I am aware that the same is relatively straightforward for example for lists,

from typing import TypeVar, List

T  = TypeVar('T')

def get_index(l: List[T], i: int) -> T:
    return l[i]

but I'm having a hard time translating it to the pint.Quantity context.

nunupeke avatar Nov 17 '21 13:11 nunupeke

I think you'd need to use numpy.typing.NDArray[X] rather than numpy.ndarray and then you can return X, see https://stackoverflow.com/a/68817265/3663881 (although array[i] could be something else than X if array is a higher-dimensional array; I guess we need to wait for shape support in numpy.typing before you can actually write that safely).

tgpfeiffer avatar Nov 17 '21 13:11 tgpfeiffer

Ok, you are right. My example function is not ideal. What I was really trying to find is an annotation that says: "if you use numpy arrays here, expect scalars there" and equivalently "if you use array quantites here, expect scalar quantities there" or vice versa. Another example:

from typing import TypeVar, Generic
import numpy as np
from pint import Quantity

A = TypeVar('A', np.ndarray, Quantity[np.ndarray])

class Converter(Generic[A]):
    def __init__(self, scale: "float in case A is np.ndarray / Quantity[float] in case A is Quantity[np.ndarray]"):
        self.scale = scale

    def convert(self, array: A) -> A:
        return A/self.scale

nunupeke avatar Nov 17 '21 13:11 nunupeke

I see. I think in that case you are looking for typing.overload, there you can have multiple annotations for the same function that specify further what goes in and out.

For the function you are implementing I think you will need a type annotation like

def get_index(array: Union[np.ndarray, Quantity[np.ndarray]], i: int) -> Union[float, Quantity[float]]:
    return array[i]

but as you write that's not specific enough, a mypy run on

data = np.asarray([3., 4.])
data_q = Q_(data, 'meter')

reveal_type(get_index(data, 0))
reveal_type(get_index(data_q, 0))

prints

test.py:20: note: Revealed type is "Union[builtins.float, pint.quantity.Quantity[builtins.float]]"
test.py:21: note: Revealed type is "Union[builtins.float, pint.quantity.Quantity[builtins.float]]"

If you add @overload declarations like

@overload
def get_index(array: np.ndarray, i: int) -> float: ...

@overload
def get_index(array: Quantity[np.ndarray], i: int) -> Quantity[float]: ...

then mypy prints

test.py:20: note: Revealed type is "builtins.float"
test.py:21: note: Revealed type is "pint.quantity.Quantity[builtins.float]"

tgpfeiffer avatar Nov 17 '21 23:11 tgpfeiffer

I'm now suddenly interested in this. We have data providers handing us a mis-mash of TWh and PJ energy generation data and we'd like to keep our units straight. We are also using Pydantic. My first attempt to add a Quantity field resulted in this error message (using Pint 0.18):

TypeError: Fields of type "<class 'pint.quantity.Quantity'>" are not supported.

Worked around by adding

    class Config:
            arbitrary_types_allowed = True

to the models I'm enhancing with Quantity.

MichaelTiemannOSC avatar Dec 27 '21 15:12 MichaelTiemannOSC

Super interested in the use of Pint type hinting with Pydantic types.

Wondering if you were able to add something like PositiveFloat or other Pydantic types to your example @MichaelTiemannOSC


from pydantic import BaseModel, PositiveFloat
from pint import Quantity

            
class PowerPlant(BaseModel):
    power_generation: Quantity['watt']
    class Config:
        arbitrary_types_allowed = True

noor_solar = PowerPlant(power_generation=Quantity(160, 'megawatt'))

noor_solar.power_generation

shimwell avatar Jan 11 '22 13:01 shimwell

Should be able to share some findings soon. I have an issue filed with pandas to sort out an ExtensionArray problem (https://github.com/pandas-dev/pandas/issues/45240) and am working with some smart people (copied) on how to make this play well with both database connectors and REST APIs.

@erikerlandson @caldeirav @joriscram

MichaelTiemannOSC avatar Jan 11 '22 13:01 MichaelTiemannOSC

@hgrecco astropy introduced something similar that we can implement, using Annotated typing that I outlined in previous comments.

https://github.com/astropy/astropy/commit/0deb5c545b5b1fe47361ed5a02a86fe9ef16d3ec

jules-ch avatar Mar 02 '22 21:03 jules-ch

Really curious on any progress on this as I'm getting into this very topic and have some ugly workarounds like:

from pydantic import BaseModel, validator
from pint import Quantity

ureg = pint.UnitRegistry()

class MyModel(BaseModel):
    distance: str
    
    @validator("distance")
    def is_length(cls, v):
        q = ureg.Quantity(v)
        assert q.check("[length]"), "dimensionality must be [length]"
        return q
>>> MyModel(distance="2 ly").distance
2 light_year

deeplook avatar May 03 '22 11:05 deeplook

Really curious on any progress on this as I'm getting into this very topic and have some ugly workarounds like:

from pydantic import BaseModel, validator
from pint import Quantity

ureg = pint.UnitRegistry()

class MyModel(BaseModel):
    distance: str
    
    @validator("distance")
    def is_length(cls, v):
        q = ureg.Quantity(v)
        assert q.check("[length]"), "dimensionality must be [length]"
        return q
>>> MyModel(distance="2 ly").distance
2 light_year

I made a quick, slightly nicer, workaround based off your workaround

from pydantic import BaseModel
import pint


class PintType:
    Q = pint.Quantity

    def __init__(self, q_check: str):
        self.q_check = q_check

    def __get_validators__(self):
        yield self.validate

    def validate(self, v):
        q = self.Q(v)
        assert q.check(self.q_check), f"Dimensionality must be {self.q_check}"
        return q


Length = PintType("[length]")

class MyModel(BaseModel):
    distance: Length

    class Config:
        json_encoders = {
            pint.Quantity: str
        }

mcleantom avatar May 03 '22 16:05 mcleantom

I made a quick, slightly nicer, workaround based off your workaround

Indeed, thanks!

deeplook avatar May 03 '22 18:05 deeplook

Thank you all for posting this, it has been incredibly helpful.

One thing I had to mention is that I was having issues with the example above because the fields were objects and not classes, so I tweaked things a bit to support jsonschema output and assignment validation.

Here is a public gist with a more complete example.

Open to any suggestions on how to improve this:

from pint import Quantity, Unit, UnitRegistry
from pydantic import BaseModel


registry = UnitRegistry()

schema_extra = dict(definitions=[
    dict(
        Quantity=dict(type="string"),
    )
])


def quantity(dimensionality: str) -> type:
    """A method for making a pydantic compliant Pint quantity field type."""

    try:
        registry.get_dimensionality(dimensionality)
    except KeyError:
        raise ValueError(f"{dimensionality} is not a valid dimensionality in pint!")

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, value):
        quantity = Quantity(value)
        assert quantity.check(cls.dimensionality), f"Dimensionality must be {cls.dimensionality}"
        return quantity

    @classmethod
    def __modify_schema__(cls, field_schema):
        field_schema.update(
            {"$ref": "#/definitions/Quantity"}
        )
    
    return type(
        "Quantity",
        (Quantity,),
        dict(
            __get_validators__=__get_validators__,
            __modify_schema__=__modify_schema__,
            dimensionality=dimensionality,
            validate=validate,
        ),
    )


class MyModel(BaseModel):

    distance: quantity("[length]")
    speed: quantity("[length]/[time]")

    class Config:
        validate_assignment = True
        schema_extra = schema_extra
        json_encoders = {
            Quantity: str,
        }
model = MyModel(distance="1.5 ly", speed="15 km/hr")
model
>>> MyModel(distance=<Quantity(1.5, 'light_year')>, speed=<Quantity(15.0, 'kilometer / hour')>)

# check the jsonschema, could make the definition for Quantity better...
print(MyModel.schema_json(indent=2))
>>> {
  "title": "MyModel",
  "type": "object",
  "properties": {
    "distance": {
      "$ref": "#/definitions/Quantity"
    },
    "speed": {
      "$ref": "#/definitions/Quantity"
    }
  },
  "required": [
    "distance",
    "speed"
  ],
  "definitions": [
    {
      "Quantity": {
        "type": "string"
      }
    }
  ]
}

# convert to a python dictionary
model.dict()
>>> {'distance': 1.5 <Unit('light_year')>, 'speed': 15.0 <Unit('kilometer / hour')>}

# serialize to json
print(model.json(indent=2))
>>> {
  "distance": "1.5 light_year",
  "speed": "15.0 kilometer / hour"
}

import json

# load from json
MyModel.parse_obj(json.loads(model.json()))
>>> MyModel(distance=<Quantity(1.5, 'light_year')>, speed=<Quantity(15.0, 'kilometer / hour')>)

# test that it raises error when assigning wrong quantity kind
model.distance = "2 m/s"

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In [14], line 1
----> 1 model.distance = "2 m/s"

File C:\mf\envs\jafte\lib\site-packages\pydantic\main.py:385, in pydantic.main.BaseModel.__setattr__()

ValidationError: 1 validation error for MyModel
distance
  Dimensionality must be [length] (type=assertion_error)

sanbales avatar Sep 24 '22 18:09 sanbales

@sanbales that was incredibly helpful code! I'm now trying to build a production_quantity function that validates that a given Quantity is among the types of quantities that we deal with in "production". I have written this:

schema_extra = dict(definitions=[
    dict(
        Quantity=dict(type="string"),
        ProductionQuantity=dict(type="List[str]"),
    )
])

class ProductionQuantity(BaseModel):

    dims_list: List[str]

    @validator('dims_list')
    def units_must_be_registered(cls, v):
        for d in v:
            try:
                registry.get_dimensionality(d)
            except KeyError:
                raise ValueError(f"{d} is not a valid dimensionality in pint!")
        return v

    class Config:
        validate_assignment = True
        schema_extra = schema_extra
        json_encoders = {
            Quantity: str,
        }

def production_quantity(dims_list: List[str]) -> type:
    """A method for making a pydantic compliant Pint production quantity."""

    try:
        for dimensionality in dims_list:
            registry.get_dimensionality(dimensionality)
    except KeyError:
        raise ValueError(f"{dimensionality} is not a valid dimensionality in pint!")

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, value):
        quantity = Quantity(value)
        for dimensionality in cls.dims_list:
            if quantity.check(dimensionality):
                return quantity
        raise DimensionalityError(value.units, f"in [{cls.dims_list}]")

    @classmethod
    def __modify_schema__(cls, field_schema):
        field_schema.update(
            {"$ref": "#/definitions/ProductionQuantity"}
        )
    
    return type(
        "ProductionQuantity",
        (ProductionQuantity,),
        dict(
            __get_validators__=__get_validators__,
            __modify_schema__=__modify_schema__,
            dims_list=dims_list,
            validate=validate,
        ),
    )

But pydantic gives me this error, which I haven't been able to fully grok:

TypeError: The type of ProductionQuantity.dims_list differs from the new default value; if you wish to change the type of this field, please use a type annotation

MichaelTiemannOSC avatar Oct 09 '22 16:10 MichaelTiemannOSC