gdal icon indicating copy to clipboard operation
gdal copied to clipboard

Adding Python Stubs or Type Hints to gdal

Open julian-belina opened this issue 4 months ago • 10 comments

Feature description

I think it would be useful to add Python stubs to GDAL that provide static information about the expected data types. These would help to identify incorrect or unintended usage of GDAL. A quick search yielded no results. Has this already been tried or deemed unimportant?

Here is a link to further information about Python stubs:

https://typing.python.org/en/latest/guides/writing_stubs.html

Additional context

No response

julian-belina avatar Aug 14 '25 12:08 julian-belina

In https://github.com/OSGeo/gdal/pull/8399 I can see

Introducing stubs to support type changes would be a drag, hopefully that could be avoided.

I do not know what it means, though.

jratike80 avatar Aug 14 '25 12:08 jratike80

Hmm, I don't fully understand the post either, but it seems they are discussing C data types. Python stubs simply provide static information about the expected input and output types of Python objects and functions. They can also be used to ensure the expected behaviour during testing.

julian-belina avatar Aug 14 '25 13:08 julian-belina

@dbaston I never heard about Python stubs. Are you familiar with that?

rouault avatar Aug 14 '25 14:08 rouault

For example, Pandas provides these: https://github.com/pandas-dev/pandas-stubs

julian-belina avatar Aug 14 '25 14:08 julian-belina

As a frequent user of gdal-python in a large codebase where we try to add type hints as much as possible (we use pyright as our type checker), I would like to say that I am very much in favour of adding type stubs to gdal-python. Though I do realize this is probably a large task, and in some places it might even be impossible.

NB official python docs on typing are here: https://docs.python.org/3/library/typing.html. You'll find more gentle introductions on the internet without effort, e.g. https://realpython.com/python-type-checking/

vincentschut avatar Aug 14 '25 14:08 vincentschut

@dbaston I never heard about Python stubs. Are you familiar with that?

I understand it to mean that we provide type annotations, but as a separate file rather than inline with the function definitions.

It would be nice if SWIG could generate these annotations, but it does not (https://github.com/swig/swig/issues/735)

It would be annoying to manually specify the types both in docstrings (as we are doing now) and in annotations. Here is a rather new project attempting to generate the annotations from docstrings: https://github.com/scientific-python/docstub

dbaston avatar Aug 14 '25 14:08 dbaston

The main issue I wanted to raise is adding type hints to GDAL. I wasn't aware that there might be other options for Python type hints for code that isn't written natively in Python.

julian-belina avatar Aug 15 '25 07:08 julian-belina

It would be annoying to manually specify the types both in docstrings (as we are doing now) and in annotations.

I don't know SWIG or how the GDAL documentation is generated, but I think the standard approach elsewhere is to use something like sphinx-autodoc-typehints, so the type annotations on the function get injected into the generated documentation instead of vice versa.

Nowadays I'm guessing Python type checkers are much more widely tested/used with checking type annotations than they are with checking types from doc strings.

pjonsson avatar Aug 15 '25 12:08 pjonsson

I've been thinking about this for a while. Maybe a best approach is to have a scratch repo and seed it with the results of mypy's stubgen or pyright's pyright --createstub. Then the community can collaborate until each pyi file is good enough to bring over to the main repo.

https://typing.python.org/en/latest/guides/writing_stubs.html

And thanks to whoever added type annotations in https://github.com/OSGeo/gdal/tree/master/swig/python/gdal-utils/osgeo_utils/auxiliary

schwehr avatar Aug 15 '25 15:08 schwehr

This task is too big for me to get involved in, but I recently type annotated a couple of thousand functions so I'll mention a few things that helped me.

  1. autotyping can do a bunch of the simple stuff without any manual intervention. There is a --safe flag and an --aggressive flag, and initially I thought the aggressive mode was "the generated annotations might be wrong", but I'm not so sure any longer, it's possible the documentation is just trying to say "the generated type annotations are correct, but there is a greater chance the type checker will fail once it gets these annotations in place". If you have a Python environment up and running already, it's literally 2 minutes of work to run this tool so I see no reason to avoid it.
  2. autotyping can also use the output from pyanalyze. The process was automatic (and things were much faster than doing it by hand), but the inferred types weren't necessarily complete, so assume there is some manual work left to get things production ready.
  3. pyre infer can also infer types and put the results into type annotations. Same caveat for this tool as with pyanalyze, the inferred types aren't necessarily complete. I ran this on my previous computer so I no longer have the configuration, but I remember it took me an hour or similar to configure pyre to just infer things for my package and not step into .venv and try to infer everything there (many dependencies in that directory, so inferring their types takes forever).

I remember I started with autotyping, but I don't remember which order I ran pyanalyze and pyre in, it's possible that in the end I ended up iterating between them because the added annotations from one tool helped the other tool to figure out more things.

Here is a list of other tools that might be helpful. I tried some of the others that I haven't mentioned and they didn't help my situation, but that could be tied to my circumstances.

I don't know how many functions GDAL has, but I suspect it is many, so having the CI check the type annotations will more or less be a requirement to keep things consistent.

I tried stepping into swig/python and run mypy . and pyright, and there are 400+ errors with both of them right now. I'm sure some of the issues are because the type checker sometimes get confused, and some issues because I don't have a Python environment with the right packages and type stubs, but some issues look like real errors, like this one:

gdal-utils/osgeo_utils/auxiliary/color_table.py:84: error: Argument 1 to "color_table_from_color_palette" has incompatible type "ColorPalette | None"; expected "ColorPalette"  [arg-type]

It's possible it's get_color_palette that shouldn't start with checking if its argument is None since the declared type for the argument doesn't contain None, but it could also be that the argument has the wrong type annotation.

Most of the stuff I've mentioned is not particularly difficult, it's just tedious. And I'm absolutely not trying to discourage anyone from starting, I believe having type annotations would be really valuable.

pjonsson avatar Aug 15 '25 22:08 pjonsson