rioxarray
rioxarray copied to clipboard
Valid VRT file with raster bands of with different dtype cannot be loaded
Xarray, Rasterio, and rioxarray don't usually have any issue loading an VRT file in place of a real TIF file. However in this case, have a valid VRT file that has two bands, each band has a different dtype (that is not valid in TIF file, but is valid in a VRT file), but rioxarray refuses to load it.
Code Sample
import xarray as xr
import rioxarray
ds = xr.open_dataset("https://example.org/path/dataset.vrt", engine="rasterio")
Problem description
ValueError("All bands should have the same dtype")
Expected Output
Xarray dataset with band each with their own given dtype.
Environment Information
rioxarray (0.9.0) deps:
rasterio: 1.2.10
xarray: 0.20.1
GDAL: 3.3.2
Other python deps:
scipy: 1.4.1
pyproj: 3.3.0
Interesting. Are you able to provide the input files?
This may require #296 as the current method for loading brings it all into the same array.
Looks like github doesn't allow uploading of vrt or xml files, so I've renamed it to .txt CMRSET_LANDSAT_V2_2_2021_01_01_ETa.txt Rename it back to .vrt to test it. Inside there are two bands: VRTRasterBand 1 with datatype of Int16, and VRTRasterBand 2 with datatype of Byte.
Note, this is just the virtual file. The real files are here: https://swift.rc.nectar.org.au/v1/AUTH_05bca33fce34447ba7033b9305947f11/landscapes-csiro-aet-public/v2_2/2021/2021_01_01/
They're very large, but you can load the vrt file remotely with
/vsicurl/https://swift.rc.nectar.org.au/v1/AUTH_05bca33fce34447ba7033b9305947f11/landscapes-csiro-aet-public/v2_2/2021/2021_01_01/CMRSET_LANDSAT_V2_2_2021_01_01_ETa.vrt
and you don't need to download anything.
I was looking into the sourcecode to see what it would take to fix it (thought it might be a quick simple fix) but I noticed that it loads all bands into one array, so yes I say it would need #296 before this can be fixed.
Note, the example files I linked to above (the ones hosted on swift.rc.nectar.org.au) have now been replaced with VRT files with only a single band, to work around this problem. So they can no longer be used for reproducing/testing this issue.
replaced with VRT files with only a single band, to work around this problem
That sounds like a reasonable workaround.
If you have a way of making a VRT with each file as a subdataset (so it would behave like a netCDF file), that would also be a nice workaround in my opinion. That way the datatype and other metadata would be separated into separate variables.
Alternatively, one idea I had was that rioxarray could be modified to pick the largest data type and convert all of the other data types to the largest data type so data loss does not occur. However, this only works if the nodata value and other metadata are the same. Otherwise, it could be dangerous to do so.
With #296 resolved, I believe this is resolved as well.