spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

Jupyter kernel crashing with sdata.write

Open josenimo opened this issue 11 months ago • 4 comments

Describe the bug I am writing large sdata objects to zarr, and the Kernel fails in an unpredictable manner.

I parse the image into sdata, a large mIF image (15, 44470, 73167) (8bit), with scale factors (5,5) to create a multiscale object. Then writing that simple sdata object seems to fail, (it takes about 20min, so only tried twice).

Before I send over this large data, are there any expected limitations from writing sdata objects into Zarr in a Jupyter notebook? My naive concerns think about:

  1. chunking (what if chunk size is larger than downscaled image size?) (can I chunk different scales dynamically? I use the parser to chunk.)
  2. Hardware (I use M2 Macbook Pro).

This kind of kernel failures are particularly frustrating because they corrupt the zarr object, I was writing some new elements (another very large image) and it crashed, and it killed the object.

josenimo avatar Jan 14 '25 10:01 josenimo

Thanks for reporting, and sorry to hear about this bug, it sounds indeed frustrating.

Before I send over this large data, are there any expected limitations from writing sdata objects into Zarr in a Jupyter notebook?

No, there is no expected limitation in .ipynb vs .py for this task.

chunking (what if chunk size is larger than downscaled image size?) (can I chunk different scales dynamically? I use the parser to chunk.)

Yes, you can rechunk the data after calling the parser and before saving as you see it fit. For instance check the code from here. Anyway it could be that your problem is due to a bug involving compression, please check here: https://github.com/scverse/spatialdata/issues/812#issuecomment-2575983527.

LucaMarconato avatar Jan 14 '25 14:01 LucaMarconato

It could be also due to this https://github.com/scverse/spatialdata/issues/821#issuecomment-2632201695 (but I think it's due to the bug above).

LucaMarconato avatar Feb 04 '25 15:02 LucaMarconato

Hey @LucaMarconato,

Here is what I have learnt:

  • #812 is not helpful in this case, I used to get this error of the 2GB buffer, and since then I chunk into less than 2gb chunks, which also solved the issue for me.
  • #478 auto rechunk for each scale, as of now, does not work for me, it leads to a
ValueError: Attempt to save array to zarr with irregular chunking, please call `arr.rechunk(...)` first
  • #821 this was a great thread about the chunking issue and current dask issues

  • #861 I look forward to this PR :)

josenimo avatar Feb 11 '25 16:02 josenimo

Hi @josenimo please let me know if rechunking without using 'auto', as shown here https://github.com/scverse/spatialdata/issues/812#issuecomment-2575983527, fixes your problem.

LucaMarconato avatar Feb 11 '25 18:02 LucaMarconato