rioxarray
rioxarray copied to clipboard
All multiple nodata values to be passed to reproject
For datasets with several data variables of different datatypes, it would be helpful to set a nodata value for each data variable.
Current Behavior
The nodata keyword argument for rio.reproject accepts a single value. If nodata == None, then a default value is used based on data type.
Suggested Behavior
Allow nodata to accept a scalar or a dict, where the dict is {'var1': nodata_value_var1, 'var2': nodata_value_var2}.
The for-loop in rioxarray.raster_dataset.reproject would then check for a nodata value for the data variable.
for var in self.vars:
<snip>
if isinstance(nodata, dict):
nodata_val = nodata.get(var)
else:
nodata_val = nodata
x_dim, y_dim = _get_spatial_dims(self._obj, var)
resampled_dataset[var] = (
self._obj[var]
.rio.set_spatial_dims(x_dim=x_dim, y_dim=y_dim, inplace=True)
.rio.reproject(
dst_crs,
resolution=resolution,
shape=shape,
transform=transform,
resampling=resampling,
nodata=nodata_val,
**kwargs,
)
<snip>
This is the recommended approach for setting the nodata values: https://corteva.github.io/rioxarray/stable/getting_started/nodata_management.html
Thank you @snowman2 . I take your point. Ideally, data files will have the nodata values set correctly. This is not always the case. While using one of the recommended methods is preferable, it adds an extra layer to simple workflows.
As reproject allows nodata to be set as a keyword anyway, then it would be useful if the nodata keyword covered the case where a dataset has variables with different nodata values.
Happy to submit a PR.
A PR would be welcome!