gdalraster icon indicating copy to clipboard operation
gdalraster copied to clipboard

get_block_indexing suggestion for dataset-less option

Open mdsumner opened this issue 8 months ago • 5 comments

I'm not sure how important this is, because we can create a dataset so easily and get_block_indexing() on that, but I wanted to explore how this might look and I won't be able to do that soon, so I'm dropping here as you always have really good ideas on how to add new features! (and it might be pertinent to get the idea in early before a release)

I would like to be able to do

block_indexing(ds$dim()[1:2], ds$bbox(), ds$GetBlockSize())

because, then we could entirely independently do

block_indexing(c(360, 180), c(-180, -90, 180, 90), c(24, 12))

Maybe it's a new function, or maybe it's enabled by the existing one, I wanted to explore.

mdsumner avatar Mar 27 '25 04:03 mdsumner

I guess it would have been better to make it a stand-alone function and more general (so much for good ideas on how to add new features!). In its current form, this is pretty tied to the dataset/band. The use cases I had mind were reading by block, or chunking a large raster, possibly with many blocks per chunk but cut along block boundaries. We have getBlockSize() / getActualBlockSize() as exposed methods, so it could already be done without too much trouble in R. I mainly saw this as a helper for those cases (that was quick and easy to implement). get_block_indexing() does apply the geotransform though so it should give correct extents of the individual blocks in case of south-up or rotated.

I could be missing something, but I don't think creating a temporary raster and get_block_indexing() will generalize for what you have in mind. With your example above, we could not use GTiff for 24x12 because BLOCKXSIZE and BLOCKYSIZE must be multiples of 16. I think grout covers the general case nicely! Glad to revisit at some point though if you think we need more functionality along those lines here.

ctoney avatar Mar 28 '25 14:03 ctoney

The stand-alone version of make_chunk_index in #819 may address this.

https://usdaforestservice.github.io/gdalraster/reference/make_chunk_index.html

ctoney avatar Nov 14 '25 06:11 ctoney

Just a thought: Would it be appropriate to support overlapping chunks?

For e.g. chunked focal operations it can be useful to be able to be able to specify a number of pixels of overlap larger than the search window.

brownag avatar Nov 14 '25 22:11 brownag

Possibly but I'm unsure yet the best way to implement or if it really should be in make_chunk_index().

For chunked focal operations, one option could be to define a built-in or potentially custom kernel function with rasterToVRT(), and then read by blocks or multi-block chunks through the VRT.

Or better, with GDAL >3.12 we can use the new CLI algorithm raster neighbors which supports several well known convolution kernels and custom, can write VRT or GDALG, and then do chunked I/O on that.

Would that approach satisfy the use cases you have in mind, or would a more general chunking strategy be helpful? The main idea with make_chunk_index() is to define the chunks on natural block boundaries for potential performance improvement to chunked I/O in some scenarios. That would no longer be true with the overlapping chunks you describe, but that doesn't exclude generalizing it for other reasons.

ctoney avatar Nov 17 '25 05:11 ctoney

That makes sense, the rasterToVRT/raster neighbors approach sound like good options and probably could satisfy the use cases where I have had to set up overlapping chunks.

It is easy enough to apply a buffer to the extents / nrow/ncol / offsets out of make_chunk_index(), so probably not something that needs to be explicitly supported. On top of that, often you need the original unbuffered extent to crop the buffered data after processing, so probably best to have them separate.

brownag avatar Nov 17 '25 20:11 brownag