rioxarray icon indicating copy to clipboard operation
rioxarray copied to clipboard

Writing a large tiff without specifying BIGTIFF="YES" silently fails writing some blocks

Open alessioarena opened this issue 2 years ago • 5 comments

Code Sample, a copy-pastable example if possible

A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

import xarray as xr
import dask.array as da
import rioxarray as rio

size = (30_000, 60_000)

data = xr.DataArray(
    data = da.random.random(size), 
    coords={'y':np.linspace(0, size[0]*10, size[0]), 'x':np.linspace(0, size[1]*10, size[1])},
    dims=('y', 'x'),
)
data = data.rio.set_crs(3857)

data[::100, ::100].plot()
# you should get something like the image in Expected Output

data.rio.to_raster('test.tif', COMPRESS="DEFLATE")

rio.open_rasterio('test.tif', chunks='auto', parallel=True, lock=False).isel(band=0)[::100, ::100].plot()
# you should get something partial the image in Problem Description

Problem description

I came across this issue recently, and seems it is linked to using COMPRESS="DEFLATE".

If running the code above, saving the image succeeds with no issue or warning raised. However, upon opening the image it looks partial. Untitled

If performing the same exact operation using rasterio, instead I get this error. https://gis.stackexchange.com/questions/368251/error-occurred-while-writing-dirty-block-from-gdalrasterbandirasterio This as the post explains it is linked to not specify BIGTIFF="YES"

Expected Output

Either a correctly saved image, or the error being raised Untitled

Environment Information

  • python -c "import rioxarray; rioxarray.show_versions()"

Python version : 3.10.12
Platform : Linux
xarray : 2023.10.1
pandas : 2.1.1
dask : 2023.10.0
numpy : 1.23.4
rasterio : 1.3.9
rioxarray : 0.15.0
geopandas : 0.14.0
shapely : 2.0.2
zarr : 2.16.1
matplotlib : 3.8.0
cartopy : 0.22.0
nbic_utils : 2.0.0
xrutils : 2.0.0

Installation method

pypi

alessioarena avatar Nov 06 '23 23:11 alessioarena

This is likely due to using a dask array when writing as it uses a different writing mechanism. Do you run into this issue with a numpy array?

snowman2 avatar Nov 08 '23 02:11 snowman2

I also have had this issue - the silent failing seems related to using dask

pfuhe1 avatar Dec 19 '23 22:12 pfuhe1

I think I have seen with rasterio, too...will just write 4GB worth and rest is empty.

RichardScottOZ avatar Feb 17 '24 05:02 RichardScottOZ

I am guessing this is related: https://github.com/corteva/rioxarray/issues/220 See: https://corteva.github.io/rioxarray/latest/examples/dask_read_write.html

snowman2 avatar Mar 01 '24 19:03 snowman2

From: https://gdal.org/drivers/raster/gtiff.html

Default: BIGTIFF=IF_NEEDED Description: "will only create a BigTIFF if it is clearly needed (in the uncompressed case, and image larger than 4GB. So no effect when using a compression)."

In your example, COMPRESS="DEFLATE". So, you need to set BIGTIFF=TES for it to work successfully. In order for a more explicit error message, GDAL is where the change likely would need to happen.

snowman2 avatar Apr 22 '24 18:04 snowman2