geotiff
geotiff copied to clipboard
Accessing data in chunks at runtime rather than loading everything on instantiation
Hey there!
I noticed this crate runs Decoder::read_image on instantiating a GeoTiff, meaning any time you're working with an image, the entire image must be loaded into memory. To me, this seemed like a possible performance issue for large files, especially when working with small sections of the image.
In my project, when I access data at a specific point, I instead compute the chunk index that contains the pixel(s) of interest, then use Decoder::read_chunk to access the chunk data. The result is a much smaller DecodingResult being held in memory.
I also have a method for accessing at multiple points. It internally organizes the requested points by chunk, then iterates to read all points within each chunk, and reorganizes outputs in the original order (e.g. corresponding to the inputs). Thus, only a single chunk is in memory at a given time, and each chunk is read at most once.
I think these ideas might have some value, perhaps they could be implemented or adopted to some degree in this crate?
Hi @kylecarow, definitely keen on accessing chunks/subsets of a GeoTIFF without loading the entire image, be it tiles (related to #3) or strips. It sounds like you've implemented some of that logic of using Decoder::read_chunk elsewhere already, would you like to bring that into this geotiff crate here?
I'm travelling the next couple of weeks so might not have time to respond as quickly, but feel free to start a PR and I'm sure someone will be open to take a look 😄
@weiji14 I'm definitely open to share my work with this crate!
I've tried to draft up a PR, but so far have struggled with organizing things neatly. I think this is mostly due to the structure of the tiff crate and its restrictively private interface, where relatively little of the image metadata is publicly exposed. I'll keep at it, but I wonder if this is a relatable sentiment?
I'll let you know when there is a PR to test out.
@kylecarow, just checking if you've done any work on this yet? It looks like tiff=0.10.0 was released last month, and there's a bit more activity happening on the repo recently, so maybe worth taking another look?
I'm just gonna tag @print-sid8 here who's been keen to get a Rust port of his Python code that determines byte ranges for a given geometry AOI, see thread at https://github.com/weiji14/cog3pio/issues/16#issuecomment-3083071786 and the links there for references.