spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

Recording segment properties don't survive round-trip to & from zarr

Open grahamfindlay opened this issue 4 weeks ago • 1 comments

When writing recordings to zarr, properties of the recording object itself are stored, but properties of the segments are not. For example:

import spikeinterface as si
recording: ConcatenateSegmentRecording = my_recording_generator()

# Confirm that the recording and its lone segment have a sampling_frequency property.
assert recording.get_num_segments() == 1
sf_rec = recording.get_sampling_frequency()
sf_seg = recording._recording_segments[0].sampling_frequency
assert sf_rec == sf_seg
print(sf_seg) # 30000.13147632312

recording.save(
    format="zarr",
    folder="path/to/destination.zarr",
    compressor_by_dataset={"traces": WavPack(bps=2.25)},
    filters_by_dataset={"times": [Delta(dtype="float64")]},
    n_jobs=80,
)

reloaded_recording = si.load("path/to/destination.zarr")
print(reloaded_recording.get_sampling_frequency()) # 30000.13147632312
assert reloaded_recording.get_num_segments() == 1
assert reloaded_recording._recording_segments[0].sampling_frequency is None # Oh no!

This has downstream effects like causing spikeinterface.preprocessing.gaussian_filter() to fail when GaussianFilterRecordingSegment tries and fails to get a sampling frequency from the segment.

The issue is that in recording.save() -> ZarrRecordingExtractor.write_recording() -> add_recording_to_zarr_group(), properties of recording segments are not stored in the attrs of the dset created by zarr_group.create_dataset() in add_traces_to_zarr(). I guess the place for this to happen would be in the subsequent call to add_properties_and_annotations(zarr_group, recording), where instead of just setting properties of the recording as new datasets on the "properties" group, there would be some iteration through segments. Maybe a props_seg{I} group with attributes as datasets if the attributes can be complex, or just a simple dset.attrs["sampling_frequency"] = ... if they are known to be json serializable. (FYI, in Zarr v3 you can save a dictionary of json serializable attributes directly in the call to zarr_group.create_dataset()/create_array()`).

Or, maybe the properties of segments can be restored entirely using properties from the parent recording during si.load() -> read_zarr_recording(). Not sure.

For now my band-aid is to just set some of the missing properties myself each time I load a recording back from disk.

grahamfindlay avatar Dec 13 '25 02:12 grahamfindlay

@grahamfindlay internally, there is no such thing as segment properties. The segments handle time information in two ways, either by inheriting the sampling frequency from the recording OR by having a time vector. When we set_times, we explicitly add timestamps to an existing segment (that was instantiated with sampling_frequency). That segment will have both sampling frequency and timestamps (in-memory), but when saving/reloading the time vector will get precedence.

In other words, I think this is a correct-ish behavior, but could be a bit ambiguous. So my suggestion will be:

  • when setting times to a segment, set sampling_frequency to None: now time information is carried by timestamps
  • the gaussian_filter should not depend on the segment sampling frequency!

I'll make a small PR to fix it

alejoe91 avatar Dec 15 '25 09:12 alejoe91