typing icon indicating copy to clipboard operation
typing copied to clipboard

Typing for multi-dimensional arrays

Open shoyer opened this issue 7 years ago • 20 comments

I'd like to open a discussion about typing for multi-dimensional arrays in general, and more specifically for NumPy. We have already been discussing this over in the NumPy issue tracker (https://github.com/numpy/numpy/issues/7370) and recently opened a new repository to start writing type stubs (https://github.com/numpy/numpy_stubs).

To help guide discussion, I wrote a document outlining ideas for array shape typing.

To summarize:

  • We would like to be able to type-check both data types (e.g., float64) and shapes (e.g., a 3x4 array) for multi-dimensional arrays.
  • There are many uses cases where support for checks using dimension identity would be valuable, e.g., to indicate that a function transforms an array with shape (N, M) to shape (N,) for arbitrary integers N and M. These dimension variables look very similar to TypeVar, if TypeVar supported integers as types.
  • A notion of "zero or more additional dimensions" would also be quite valuable, and is a core part of the type for many NumPy operations (generalized ufuncs). This might be naturally written with Ellipsis, e.g., (...., N) for an array with a last dimension of length N and any number of proceeding dimensions. There are particular rules (broadcasting) that should be enforced for matching multiple arguments with variable numbers of dimensions.

This will likely require some new typing features (as well as type-checker support). Notably:

  • Support for literal values (https://github.com/python/typing/issues/478), so we can type check operations like array.sum(axis=0).
  • Variadic generics (https://github.com/python/typing/issues/193), we can write types like NDArray[N] and NDArray[N, M].
  • Some sort of support for dimension identity in shapes (e.g., integer types, or DimensionVar as described in my doc).
  • Standard syntax for writing array dtype/shape annotations: what should these look like?

shoyer avatar Dec 07 '17 22:12 shoyer

It looks like the proposal of integer generics is also relevant here https://github.com/python/mypy/issues/3345 (it looks almost identical to what you call DimensionVar).

In general, I am very supportive of this project (I have heard many times that static typing would be very helpful for data science, numerics and related fields, but current support in mypy and PEP 484 is very limited). The main obstacle however is the size of this project (it may require its own PEP). I will read your document (thanks for writing it), but already now it seems to me that it may make sense to start from features that will be useful in general (i.e. also outside of numeric stack) such as literal types and variadic generics.

Also tagging @JukkaL here just in case.

ilevkivskyi avatar Dec 07 '17 23:12 ilevkivskyi

The main obstacle however is the size of this project (it may require its own PEP).

Yes, I expect a PEP will be necessary, especially if we want to standardize base types for typing multi-dimensional arrays in the typing module.

it seems to me that it may make sense to start from features that will be useful in general (i.e. also outside of numeric stack) such as literal types and variadic generics.

Indeed, this is probably the best place where the broader typing community can help.

shoyer avatar Dec 09 '17 05:12 shoyer

I've opened a sub-issue for discussing syntax for array typing: https://github.com/python/typing/issues/516

shoyer avatar Dec 10 '17 01:12 shoyer

Some update on the issue:

Our (mypy core team) previous schedule for working on this was Q4 2018. However, we decided that some type system features (such as literal types and variadic generics) needed to efficiently support NumPy will be also useful in general, so we decided to implement the general support for such features first. Literal types are almost already there, and variadic generics are going to be added in coming months. After that we will start working on dedicated NumPy support (around Q2), sorry for a delay.

ilevkivskyi avatar Jan 09 '19 11:01 ilevkivskyi

Sorry, I forgot to post notes from the latest Python typing meetup on numeric stack typing here. Here they are

ilevkivskyi avatar Apr 28 '19 01:04 ilevkivskyi

Are you specifically looking at numpy, or at the machine learning echosystem with numpy/pytorch/... ? I found today that they are quite heterogeneous:

>>> x = torch.zeros([4], dtype=torch.int8)
>>> y = torch.zeros([4], dtype=torch.float32)
>>> torch.add(x, y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: expected type torch.FloatTensor but got torch.CharTensor
>>> xx = numpy.array([4], dtype=numpy.int8)
>>> yy = numpy.array([4], dtype=numpy.float32)
>>> xx + yy
array([8.], dtype=float32)

Pytorch doesn't seems to do auto cast when types are different whereas Numpy is doing some upcast (see https://stackoverflow.com/questions/56022497/numpy-pytorch-dtype-conversion-compatibility/56022918?noredirect=1#comment98689941_56022918)

vsiles avatar May 07 '19 15:05 vsiles

Are you specifically looking at numpy, or at the machine learning echosystem with numpy/pytorch/... ?

At all of them. Dimensionality/shape will be an additional abstraction orthogonal to container type and element type.

ilevkivskyi avatar May 07 '19 16:05 ilevkivskyi

Sorry I wasn't clear, I wanted to ask for the numerical stack part specifically. Do we have a current target in numpy / pytorch / tensorflow that would focus most of the effort are are people looking to their favorite flavor (which seems incompatible with each other)

vsiles avatar May 07 '19 16:05 vsiles

Do we have a current target in numpy / pytorch / tensorflow that would focus most of the effort are are people looking to their favorite flavor (which seems incompatible with each other)

There are two separate big things required to support numerical libraries:

  • New type system features
  • Adding stubs for popular libraries

In the first one we ideally want to be as broad as possible, I think there are no particular "preferences". While in the second, I think we should probably start with numpy, since it is the common dernominator for many other libraries.

ilevkivskyi avatar May 07 '19 17:05 ilevkivskyi

@ilevkivskyi do you have any suggestions for how to track progress on (or, even better, contribute to) the development of these "numeric stack typing" features? Full support for the features described in your linked notes on numeric stack typing would be incredibly useful!

dmontagu avatar Jul 10 '19 19:07 dmontagu

@dmontagu The best way is to just follow this issue, also you can subscribe to [email protected] mailing list. There are no updates here because we didn't make much progress yet. Whether you can help depends on your background and how much time are you ready to spend on this. This is not a simple feature and it is hard to split in small "things".

ilevkivskyi avatar Jul 11 '19 00:07 ilevkivskyi

Hey! I'm a student working on a thesis and I am very interested in contributing to this project as part of my research! Mainly, I want to statically check dimensionality alignment in numpy operations. Let me know how I can help out.

theodoretliu avatar Nov 13 '19 01:11 theodoretliu

@theodoretliu Hi! It is great to hear you are interested. Just to get a bit more info, how much time will you be able to spend on this?

The best course of action is probably to implement support for relevant type system features in one of the mainstream Python type checkers. I would of course propose mypy :-) as one of its maintainers, see https://github.com/python/mypy

If this sounds right to you, I can give you a more detailed plan and some guidance.

ilevkivskyi avatar Nov 13 '19 14:11 ilevkivskyi

I'd be willing to dedicate pretty significant time in the coming months. And yes, that sounds like a great course of action!

theodoretliu avatar Nov 13 '19 16:11 theodoretliu

Be sure to talk to Mark Mandoza to have input from our experience doing so in Pyre :D

Le mer. 13 nov. 2019 à 17:12, Theodore Liu [email protected] a écrit :

I'd be willing to dedicate pretty significant time in the coming months. And yes, that sounds like a great course of action!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/python/typing/issues/513?email_source=notifications&email_token=ABWLNQHD7HHN6QRELANYVSLQTQRP3A5CNFSM4EHJIID2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED6VK2A#issuecomment-553473384, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWLNQFYX3G7RQZDABAIVBLQTQRP3ANCNFSM4EHJIIDQ .

vsiles avatar Nov 13 '19 16:11 vsiles

Mark Mendoza* ... my finger are a bit dumb today, sorry.

Le mer. 13 nov. 2019 à 17:14, Vincent Siles [email protected] a écrit :

Be sure to talk to Mark Mandoza to have input from our experience doing so in Pyre :D

Le mer. 13 nov. 2019 à 17:12, Theodore Liu [email protected] a écrit :

I'd be willing to dedicate pretty significant time in the coming months. And yes, that sounds like a great course of action!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/python/typing/issues/513?email_source=notifications&email_token=ABWLNQHD7HHN6QRELANYVSLQTQRP3A5CNFSM4EHJIID2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED6VK2A#issuecomment-553473384, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWLNQFYX3G7RQZDABAIVBLQTQRP3ANCNFSM4EHJIIDQ .

vsiles avatar Nov 13 '19 16:11 vsiles

A group of us at DeepMind are interested on working on this too. We've set up a mailing list at https://groups.google.com/g/python-shape-checkers to try and bring together all the conversations about this into one place. I've posted a summary there of what seems to be the current state of things, but stay tuned for updates!

mrahtz avatar Jun 12 '20 11:06 mrahtz

Hi @mrahtz,

Thanks for the initiative! Indeed there are currently a lot of ongoing efforts in this directions. At Facebook we are currently working directly on this, and already support several use cases with Pyre, with support for variadic syntax, which has been polished with respect to the initial proposal at Python Typing Summit. However, it would be very beneficial to get first hand information of the state of each team that is working on this, since so far I have read about people working on that in Dropbox, Facebook, Google and now Deepmind.

Also, please don't miss the Python Typing mailing list.

fylux avatar Jun 12 '20 12:06 fylux

I'd like to open a discussion about typing for multi-dimensional arrays in general, and more specifically for NumPy. We have already been discussing this over in the NumPy issue tracker (numpy/numpy#7370) and recently opened a new repository to start writing type stubs (https://github.com/numpy/numpy_stubs).

To help guide discussion, I wrote a document outlining ideas for array shape typing.

To summarize:

* We would like to be able to type-check both data types (e.g., `float64`) and shapes (e.g., a 3x4 array) for multi-dimensional arrays.

* There are many uses cases where support for checks using dimension identity would be valuable, e.g., to indicate that a function transforms an array with shape `(N, M)` to shape `(N,)` for arbitrary integers `N` and `M`. These dimension variables look very similar to `TypeVar`, if `TypeVar` supported integers as types.

* A notion of "zero or more additional dimensions" would also be quite valuable, and is a core part of the type for many NumPy operations (generalized ufuncs). This might be naturally written with Ellipsis, e.g., `(...., N)` for an array with a last dimension of length `N` and any number of proceeding dimensions. There are particular rules (broadcasting) that should be enforced for matching multiple arguments with variable numbers of dimensions.

This will likely require some new typing features (as well as type-checker support). Notably:

* Support for literal values (#478), so we can type check operations like `array.sum(axis=0)`.

* Variadic generics (#193), we can write types like `NDArray[N]` and `NDArray[N, M]`.

* Some sort of support for dimension identity in shapes (e.g., integer types, or `DimensionVar` as described in my doc).

* Standard syntax for writing array dtype/shape annotations: what should these look like?

You wanted this annotation:

class float64: # Custom annotation class
    def __getitem__(self, item):
        # Some value should be set to identify that float64[:], float64[:,:] or etc.
        return self


float64 = float64()


def for_loop(n: float64[:,:]):
    pass

Take it ;)

redradist avatar Jun 29 '20 17:06 redradist

To solve this issue, using "Annotated[]" would be efficient to declare the type already. However to get the proper type and "static" type checking on "Annotated[]" we need support on mypy/pyanalyze etc. To annotate and infer type with arithmetic from function calls like "np.reshape" we need to use code to define custom rules (not just PEP484) to analyze proper types. I doubt there are few supports on custom "Annotated[]" types, not easy for user to define and statically check their own "Annotated[]" types, which probably is the solution to all kinds of dynamic types in python, enabling symbolic execution of arbitrary python code.

James4Ever0 avatar Jul 07 '23 05:07 James4Ever0