spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

Refactoring the io and in-memory lazy vs non-lazy representation

Open LucaMarconato opened this issue 2 years ago • 6 comments

Here are the linked issues Incremental IO https://github.com/scverse/spatialdata/issues/186 https://github.com/scverse/spatialdata/issues/222 (old PR containing discussion) https://github.com/scverse/spatialdata/pull/138

Lazy vs non-lazy https://github.com/scverse/spatialdata/issues/243 https://github.com/scverse/spatialdata/issues/153 https://github.com/scverse/spatialdata/issues/297

LucaMarconato avatar Jun 09 '23 12:06 LucaMarconato

@lopollar reported that saving extra columns in shapes layers is not working at the moment https://github.com/scverse/spatialdata/issues/311. I think it was working but we didn't have a test for that.

Please @giovp when working on this PR also add a test for this.

LucaMarconato avatar Jun 27 '23 22:06 LucaMarconato

Refactoring the geometry argument of the shapes model: https://github.com/scverse/spatialdata/issues/315

LucaMarconato avatar Jul 02 '23 22:07 LucaMarconato

question regarding incremental, plan is to remove entirely the write/read steps when __setitem__ is called, unless overwrite=True, in that case a call to write follows and hence makes sense to reload data lazy?

giovp avatar Jul 30 '23 22:07 giovp

I'd maybe remove add_image(), etc and leave only __setitem__. In this way there is no option to pass overwrite=True. I would then add a method to resave to disk a specifc element. So the workflow would be:

sdata.images['my_image'] = im
# or even the following, that is already implemented and uses single dispatch to forward 
# the right `__setitem__`
sdata['my_image'] = im
sdata.write('my_data.zarr')
# example of in-place operation (can be both lazy or in-memory)
sdata['my_image'] = sdata['my_image'] + 1
sdata.write(element='my_image')

We could add a parameter reload=True to write() that does the lazy re-reading, and let the user disable it for specific cases, like when the data is all full in-memory and the user wants to keep it there.

LucaMarconato avatar Jul 30 '23 23:07 LucaMarconato

I'd then have something like this to load an element in-memory.

# all is read lazy by default
sdata = read_zarr('my_data.zarr')
sdata.load_in_memory(element='my_image')

LucaMarconato avatar Jul 30 '23 23:07 LucaMarconato

We should also consider adding the option to keep data lazy but making it persistent (.persist()).

LucaMarconato avatar Jun 11 '24 13:06 LucaMarconato