rioxarray icon indicating copy to clipboard operation
rioxarray copied to clipboard

Valid VRT file with raster bands of with different dtype cannot be loaded

Open ashleysommer opened this issue 2 years ago • 6 comments

Xarray, Rasterio, and rioxarray don't usually have any issue loading an VRT file in place of a real TIF file. However in this case, have a valid VRT file that has two bands, each band has a different dtype (that is not valid in TIF file, but is valid in a VRT file), but rioxarray refuses to load it.

Code Sample

import xarray as xr
import rioxarray
ds = xr.open_dataset("https://example.org/path/dataset.vrt", engine="rasterio")

Problem description

ValueError("All bands should have the same dtype")

Expected Output

Xarray dataset with band each with their own given dtype.

Environment Information

rioxarray (0.9.0) deps:
  rasterio: 1.2.10
    xarray: 0.20.1
      GDAL: 3.3.2

Other python deps:
     scipy: 1.4.1
    pyproj: 3.3.0

ashleysommer avatar Mar 03 '22 02:03 ashleysommer

Interesting. Are you able to provide the input files?

snowman2 avatar Mar 03 '22 02:03 snowman2

This may require #296 as the current method for loading brings it all into the same array.

snowman2 avatar Mar 03 '22 02:03 snowman2

Looks like github doesn't allow uploading of vrt or xml files, so I've renamed it to .txt CMRSET_LANDSAT_V2_2_2021_01_01_ETa.txt Rename it back to .vrt to test it. Inside there are two bands: VRTRasterBand 1 with datatype of Int16, and VRTRasterBand 2 with datatype of Byte.

Note, this is just the virtual file. The real files are here: https://swift.rc.nectar.org.au/v1/AUTH_05bca33fce34447ba7033b9305947f11/landscapes-csiro-aet-public/v2_2/2021/2021_01_01/ They're very large, but you can load the vrt file remotely with /vsicurl/https://swift.rc.nectar.org.au/v1/AUTH_05bca33fce34447ba7033b9305947f11/landscapes-csiro-aet-public/v2_2/2021/2021_01_01/CMRSET_LANDSAT_V2_2_2021_01_01_ETa.vrt and you don't need to download anything.

ashleysommer avatar Mar 03 '22 02:03 ashleysommer

I was looking into the sourcecode to see what it would take to fix it (thought it might be a quick simple fix) but I noticed that it loads all bands into one array, so yes I say it would need #296 before this can be fixed.

ashleysommer avatar Mar 03 '22 03:03 ashleysommer

Note, the example files I linked to above (the ones hosted on swift.rc.nectar.org.au) have now been replaced with VRT files with only a single band, to work around this problem. So they can no longer be used for reproducing/testing this issue.

ashleysommer avatar Mar 03 '22 06:03 ashleysommer

replaced with VRT files with only a single band, to work around this problem

That sounds like a reasonable workaround.

If you have a way of making a VRT with each file as a subdataset (so it would behave like a netCDF file), that would also be a nice workaround in my opinion. That way the datatype and other metadata would be separated into separate variables.

Alternatively, one idea I had was that rioxarray could be modified to pick the largest data type and convert all of the other data types to the largest data type so data loss does not occur. However, this only works if the nodata value and other metadata are the same. Otherwise, it could be dangerous to do so.

snowman2 avatar Mar 03 '22 14:03 snowman2

With #296 resolved, I believe this is resolved as well.

snowman2 avatar Dec 19 '22 21:12 snowman2