Recording segment properties don't survive round-trip to & from zarr
When writing recordings to zarr, properties of the recording object itself are stored, but properties of the segments are not. For example:
import spikeinterface as si
recording: ConcatenateSegmentRecording = my_recording_generator()
# Confirm that the recording and its lone segment have a sampling_frequency property.
assert recording.get_num_segments() == 1
sf_rec = recording.get_sampling_frequency()
sf_seg = recording._recording_segments[0].sampling_frequency
assert sf_rec == sf_seg
print(sf_seg) # 30000.13147632312
recording.save(
format="zarr",
folder="path/to/destination.zarr",
compressor_by_dataset={"traces": WavPack(bps=2.25)},
filters_by_dataset={"times": [Delta(dtype="float64")]},
n_jobs=80,
)
reloaded_recording = si.load("path/to/destination.zarr")
print(reloaded_recording.get_sampling_frequency()) # 30000.13147632312
assert reloaded_recording.get_num_segments() == 1
assert reloaded_recording._recording_segments[0].sampling_frequency is None # Oh no!
This has downstream effects like causing spikeinterface.preprocessing.gaussian_filter() to fail when GaussianFilterRecordingSegment tries and fails to get a sampling frequency from the segment.
The issue is that in recording.save() -> ZarrRecordingExtractor.write_recording() -> add_recording_to_zarr_group(), properties of recording segments are not stored in the attrs of the dset created by zarr_group.create_dataset() in add_traces_to_zarr(). I guess the place for this to happen would be in the subsequent call to add_properties_and_annotations(zarr_group, recording), where instead of just setting properties of the recording as new datasets on the "properties" group, there would be some iteration through segments. Maybe a props_seg{I} group with attributes as datasets if the attributes can be complex, or just a simple dset.attrs["sampling_frequency"] = ... if they are known to be json serializable. (FYI, in Zarr v3 you can save a dictionary of json serializable attributes directly in the call to zarr_group.create_dataset()/create_array()`).
Or, maybe the properties of segments can be restored entirely using properties from the parent recording during si.load() -> read_zarr_recording(). Not sure.
For now my band-aid is to just set some of the missing properties myself each time I load a recording back from disk.
@grahamfindlay internally, there is no such thing as segment properties.
The segments handle time information in two ways, either by inheriting the sampling frequency from the recording OR by having a time vector.
When we set_times, we explicitly add timestamps to an existing segment (that was instantiated with sampling_frequency). That segment will have both sampling frequency and timestamps (in-memory), but when saving/reloading the time vector will get precedence.
In other words, I think this is a correct-ish behavior, but could be a bit ambiguous. So my suggestion will be:
- when setting times to a segment, set
sampling_frequencyto None: now time information is carried by timestamps - the
gaussian_filtershould not depend on the segment sampling frequency!
I'll make a small PR to fix it