Jupyter kernel crashing with sdata.write
Describe the bug I am writing large sdata objects to zarr, and the Kernel fails in an unpredictable manner.
I parse the image into sdata, a large mIF image (15, 44470, 73167) (8bit), with scale factors (5,5) to create a multiscale object. Then writing that simple sdata object seems to fail, (it takes about 20min, so only tried twice).
Before I send over this large data, are there any expected limitations from writing sdata objects into Zarr in a Jupyter notebook? My naive concerns think about:
- chunking (what if chunk size is larger than downscaled image size?) (can I chunk different scales dynamically? I use the parser to chunk.)
- Hardware (I use M2 Macbook Pro).
This kind of kernel failures are particularly frustrating because they corrupt the zarr object, I was writing some new elements (another very large image) and it crashed, and it killed the object.
Thanks for reporting, and sorry to hear about this bug, it sounds indeed frustrating.
Before I send over this large data, are there any expected limitations from writing sdata objects into Zarr in a Jupyter notebook?
No, there is no expected limitation in .ipynb vs .py for this task.
chunking (what if chunk size is larger than downscaled image size?) (can I chunk different scales dynamically? I use the parser to chunk.)
Yes, you can rechunk the data after calling the parser and before saving as you see it fit. For instance check the code from here. Anyway it could be that your problem is due to a bug involving compression, please check here: https://github.com/scverse/spatialdata/issues/812#issuecomment-2575983527.
It could be also due to this https://github.com/scverse/spatialdata/issues/821#issuecomment-2632201695 (but I think it's due to the bug above).
Hey @LucaMarconato,
Here is what I have learnt:
- #812 is not helpful in this case, I used to get this error of the 2GB buffer, and since then I chunk into less than 2gb chunks, which also solved the issue for me.
- #478 auto rechunk for each scale, as of now, does not work for me, it leads to a
ValueError: Attempt to save array to zarr with irregular chunking, please call `arr.rechunk(...)` first
-
#821 this was a great thread about the chunking issue and current dask issues
-
#861 I look forward to this PR :)
Hi @josenimo please let me know if rechunking without using 'auto', as shown here https://github.com/scverse/spatialdata/issues/812#issuecomment-2575983527, fixes your problem.