Refactoring the io and in-memory lazy vs non-lazy representation
Here are the linked issues Incremental IO https://github.com/scverse/spatialdata/issues/186 https://github.com/scverse/spatialdata/issues/222 (old PR containing discussion) https://github.com/scverse/spatialdata/pull/138
Lazy vs non-lazy https://github.com/scverse/spatialdata/issues/243 https://github.com/scverse/spatialdata/issues/153 https://github.com/scverse/spatialdata/issues/297
@lopollar reported that saving extra columns in shapes layers is not working at the moment https://github.com/scverse/spatialdata/issues/311. I think it was working but we didn't have a test for that.
Please @giovp when working on this PR also add a test for this.
Refactoring the geometry argument of the shapes model: https://github.com/scverse/spatialdata/issues/315
question regarding incremental, plan is to remove entirely the write/read steps when __setitem__ is called, unless overwrite=True, in that case a call to write follows and hence makes sense to reload data lazy?
I'd maybe remove add_image(), etc and leave only __setitem__. In this way there is no option to pass overwrite=True. I would then add a method to resave to disk a specifc element. So the workflow would be:
sdata.images['my_image'] = im
# or even the following, that is already implemented and uses single dispatch to forward
# the right `__setitem__`
sdata['my_image'] = im
sdata.write('my_data.zarr')
# example of in-place operation (can be both lazy or in-memory)
sdata['my_image'] = sdata['my_image'] + 1
sdata.write(element='my_image')
We could add a parameter reload=True to write() that does the lazy re-reading, and let the user disable it for specific cases, like when the data is all full in-memory and the user wants to keep it there.
I'd then have something like this to load an element in-memory.
# all is read lazy by default
sdata = read_zarr('my_data.zarr')
sdata.load_in_memory(element='my_image')
We should also consider adding the option to keep data lazy but making it persistent (.persist()).