rioxarray Better typings for `open

Hi all!

I was wondering if it could be considered to improve the typings for rioxarray.open_rasterio.

Indeed, the returned typings are

xarray.Dataset | xarray.DataArray | list[xarray.Dataset]

Creating so many typing issues in the code, like:

Cannot access attribute "attrs" for class "list[Dataset]"
  Attribute "attrs" is unknown. Pylance[reportAttributeAccessIssue]

The solution would be to ignore the type or check everywhere if the instance is a list, which is not a clean solution IMO.

It would be amazing if you would consider only returning xarray.Dataset | xarray.DataArray for this function.

Have a great day!

Apr 29 '25 14:04 renaudjester

Not all datasets have uniform resolutions. Due to this, a list can be returned with datasets grouped by resolution: https://github.com/corteva/rioxarray/blob/2a1f01e6fd141bdb1f87f4222165cf34f517dd97/test/integration/test_integration__io.py#L224

Apr 30 '25 01:04 snowman2

Maybe it should be another rioxarray function that can open datasets grouped by resolution.

I think I get the idea of "why" it is like this, but I still think it's not very convenient when developing. That's why I consider it an improvement if it could work differently. Like with rioxarray.open_dataset and then rioxarray.open_groups or else.

Apr 30 '25 07:04 renaudjester

The interface you want already exists:


import xarray

dataset = xarray.open_dataset(..., engine="rasterio")
array = xarray.open_dataarray(..., engine="rasterio")

Apr 30 '25 10:04 snowman2

True! I will be using this then! Thanks for your answer, feel free to close this issue if you think that this is enough! Have a great day!

Apr 30 '25 17:04 renaudjester

@snowman2 one can probably use @overload annotations to support DataArray vs Dataset|list[Dataset] disambiguation, switched on literal value passed to band_as_variable= parameter, they are a bit of pain to maintain when there are a lot of common arguments though.

here is an example use from odc-geo, that allows us to know that norm_crs(None|Unset()) -> None, and hence any other input should be producing CRS and not CRS|None:

# fmt: off
@overload
def norm_crs(crs: SomeCRS) -> CRS: ...
@overload
def norm_crs(crs: SomeCRS, ctx: Any) -> CRS: ...
@overload
def norm_crs(crs: Union[None, Unset]) -> None: ...
@overload
def norm_crs(crs: Union[None, Unset], ctx: Any) -> None: ...
# fmt: on


def norm_crs(crs: MaybeCRS, ctx=None) -> Optional[CRS]:

May 01 '25 04:05 Kirill888

one can probably use @overload annotations to support DataArray vs Dataset|list[Dataset] disambiguation

The return type is determined by the contents of the file and not by a kwarg. So, overload likely won't help in this situation.

May 01 '25 11:05 snowman2

The interface you want already exists:

import xarray array = xarray.open_dataarray(..., engine="rasterio")

Unfortunately for me this was not a drop-in replacement for open_rasterio with single band tiffs, I ended up with ValueError: cannot convert float NaN to integer which I wasn't going to chase down. Instead I kept pyright happy with a type check:

    da = rioxarray.open_rasterio(path)
    if not isinstance(da, DataArray):
        raise ValueError(f"Unexpected return type from open_rasterio: {da}")

Jun 03 '25 10:06 cas--

Not all datasets have uniform resolutions. Due to this, a list can be returned with datasets grouped by resolution:

rioxarray/test/integration/test_integration__io.py

Line 224 in 2a1f01e assert isinstance(rds_list, list)

@snowman2 I think returning a DataTree instead of a list of Dataset would be nice, right? + it would solve @renaudjester's issue 🤔

Nov 18 '25 17:11 acoque

I think returning a DataTree instead of a list of Dataset would be nice, right?

Yes, that would be way better. The method pre-dates DataTree in xarray, but I think it is time for an upgrade 😄. Contributions welcome!

Nov 18 '25 19:11 snowman2

Better typings for `open_rasterio`