zarr-python Memory leak when saving in parallel

Zarr version

v2.12.0

Numcodecs version

v0.10.2

Python Version

3.9.13

Operating System

Linux

Installation

using pip (in a conda environment)

Description

Hi Zarr team!

We use the ProcessPoolExecutor to distribute writing to different Zarr chunks over different jobs (see here)

The function that each job runs simply loads a buffer and saves it to the appropriate zarr array location. We noticed, that the buffer that is saved to zarr (traces) is not properly garbage collected and this makes the RAM usage grow as the process continues. We patched it on our side by forcing a garbace collection in-place.

We think this issue is related to Zarr because we have the exact same mecanism to write to binary files and in that case the RAM usage is what we expect.

Steps to reproduce

The problem can be reproduced using SpikeInterface v0.94.0:

>>> pip install spikeinterface[full]==0.94.0

Here is a sample script to reproduce the issue:

import spikeinterface.full as si

# generate a sample recording
recording, _ = si.toy_example(num_channels=64, duration=600, num_segments=1)

# save it to zarr with parallization (by default it will use blosc-zstd)
recording_zarr = recording.save(format="zarr", zarr_path="test_ram.zarr", n_jobs=4, total_memory="500M",
                                                     progress_bar=True)

Note that the chunk size is adjusted so that the number of jobs times the memory needed by each chunk is ~500MB (total_memory).

Additional output

No response

Sep 09 '22 08:09 alejoe91

Thanks for the write up, @alejoe91. I don't see anything immediately surprising in https://github.com/SpikeInterface/spikeinterface/blob/master/spikeinterface/core/core_tools.py#L635-L709 which definitely makes me worry. Could you help us understand what zarr-level calls are being made in a typical run?

Sep 09 '22 08:09 joshmoore

@joshmoore sure, here is how the zarr calls are distributed:

The save() function is routed to this _save() function which is specific to Zarr. Here the zarr file is created and several groups and small datasets are added to it.
The write_traces_to_zarr does the actual writing.
- we create the large datasets making sure that the chunk size matches the chunk sizes that the parallel processing uses (so we write to non-overlapping blocks)
- the parallel processing is carried out by the ChunkRecordingExecutor class, which internally uses the built-in ProcessPoolExecutor
- each job is initialized with an _init_func, which allows us to do operations that are only required once (e.g., we reopen the Zarr object and store the datasets that needs writing in a context)
- the _write_zarr_chunk is then run for each chunk: it retrieves the data that need to be written and writes them to the zarr dataset

I hope this makes it clearer!

Sep 09 '22 09:09 alejoe91

Thanks for the explanation, @alejoe91. Unfortunately, I'm don't think I'm going to be able to get to a reproducible example from your description. Would it be possible to extract the Zarr code or to dump your process' memory after a few iterations so we can pinpoint what's leaking?

Sep 12 '22 15:09 joshmoore

@joshmoore I'll try to print out RAM usage with and without forcing GC. I'm a bit busy these couple of days. Planning to do it early next week.

Sep 15 '22 04:09 alejoe91

Hi @joshmoore

Sorry for the delay in getting back to you.

I tested on my local machine and it seems that RAM usage is under control and it doesn't grow as reported here.

I initially encountered the issue using a cloud resource from GCP, so that might be the issue. I'll repeat the test there to see if that architecture is triggering the abnormal RAM consumption.

Here is a log which prints the start_frame, end_frame, and RAM usage at each iteration (using 4 jobs, 1s chunk size). I'll provide the same log when running on GCP in a few days.

zarr_garbage_log.txt

Sep 22 '22 16:09 alejoe91

@alejoe91 when you encountered the initial issue, were you storing the Zarr on GCS?

If so, I think I'm running into the same thing... In my case, data I've written to a Zarr group that uses gcsfs isn't garbage collected.