rioxarray
rioxarray copied to clipboard
Better typings for `open_rasterio`
Hi all!
I was wondering if it could be considered to improve the typings for rioxarray.open_rasterio.
Indeed, the returned typings are
xarray.Dataset | xarray.DataArray | list[xarray.Dataset]
Creating so many typing issues in the code, like:
Cannot access attribute "attrs" for class "list[Dataset]"
Attribute "attrs" is unknown. Pylance[reportAttributeAccessIssue]
The solution would be to ignore the type or check everywhere if the instance is a list, which is not a clean solution IMO.
It would be amazing if you would consider only returning xarray.Dataset | xarray.DataArray for this function.
Have a great day!
Not all datasets have uniform resolutions. Due to this, a list can be returned with datasets grouped by resolution: https://github.com/corteva/rioxarray/blob/2a1f01e6fd141bdb1f87f4222165cf34f517dd97/test/integration/test_integration__io.py#L224
Maybe it should be another rioxarray function that can open datasets grouped by resolution.
I think I get the idea of "why" it is like this, but I still think it's not very convenient when developing. That's why I consider it an improvement if it could work differently. Like with rioxarray.open_dataset and then rioxarray.open_groups or else.
The interface you want already exists:
import xarray
dataset = xarray.open_dataset(..., engine="rasterio")
array = xarray.open_dataarray(..., engine="rasterio")
True! I will be using this then! Thanks for your answer, feel free to close this issue if you think that this is enough! Have a great day!
@snowman2 one can probably use @overload annotations to support DataArray vs Dataset|list[Dataset] disambiguation, switched on literal value passed to band_as_variable= parameter, they are a bit of pain to maintain when there are a lot of common arguments though.
here is an example use from odc-geo, that allows us to know that norm_crs(None|Unset()) -> None, and hence any other input should be producing CRS and not CRS|None:
# fmt: off
@overload
def norm_crs(crs: SomeCRS) -> CRS: ...
@overload
def norm_crs(crs: SomeCRS, ctx: Any) -> CRS: ...
@overload
def norm_crs(crs: Union[None, Unset]) -> None: ...
@overload
def norm_crs(crs: Union[None, Unset], ctx: Any) -> None: ...
# fmt: on
def norm_crs(crs: MaybeCRS, ctx=None) -> Optional[CRS]:
one can probably use @overload annotations to support DataArray vs Dataset|list[Dataset] disambiguation
The return type is determined by the contents of the file and not by a kwarg. So, overload likely won't help in this situation.
The interface you want already exists:
import xarray array = xarray.open_dataarray(..., engine="rasterio")
Unfortunately for me this was not a drop-in replacement for open_rasterio with single band tiffs, I ended up with ValueError: cannot convert float NaN to integer which I wasn't going to chase down. Instead I kept pyright happy with a type check:
da = rioxarray.open_rasterio(path)
if not isinstance(da, DataArray):
raise ValueError(f"Unexpected return type from open_rasterio: {da}")
Not all datasets have uniform resolutions. Due to this, a list can be returned with datasets grouped by resolution:
rioxarray/test/integration/test_integration__io.py
Line 224 in 2a1f01e assert isinstance(rds_list, list)
@snowman2 I think returning a DataTree instead of a list of Dataset would be nice, right? + it would solve @renaudjester's issue 🤔
I think returning a DataTree instead of a list of Dataset would be nice, right?
Yes, that would be way better. The method pre-dates DataTree in xarray, but I think it is time for an upgrade 😄. Contributions welcome!