tonic icon indicating copy to clipboard operation
tonic copied to clipboard

Ill-defined time bins in ToFrame

Open bauerfe opened this issue 10 months ago • 5 comments

ToFrame creates frames from events by binning them in different ways.

However, the resulting bins might not always be as expected. In particular, when attempting to bin the frames by a fixed time window, it is possible to set the number of bins or the duration of each bin. However, there are two more degrees of freedom in setting time bins, which currently cannot be chosen freely: The start and the end times of the event stream that should be binned. These values are currently inferred from the provided data as the timings of the first and the last event.

Assume now, there is a recording of some event stream that was done for 1 second, and someone wants to generate frames of 200 ms each. Intuitively one would expect to get 5 frames, with bin edges at (0ms, 200ms [...], 800ms, 1s). However, if the first event does not coincide with the start of the recording, but comes, for instance, at 100 ms after the recording onset, and the last one at 900 ms, there will only be 4 frames, with the bin edges at (100ms, 300ms, 500ms, 700ms, 900ms).

There are at least two issues with this behavior:

  1. If the recording has been synchronized with some other recording, e.g. of another sensor, I would like my frames to remain in sync with the other recording. This is not the case if the bin edges are data dependent.
  2. In a machine learning setting life is often much easier if all the data has the same shape. This is not the case if the number of frames changes for each sample.

Note that setting the number of bins instead of the bin-size is not a valid solution. In that case the bin-size becomes unpredictable, which can become an issue by itself but is also in conflict with point 1 above.

To have the bins well-defined, it is necessary to define at least three of the four parameters (bin-size, number of bins, start time, end time) together.

bauerfe avatar Aug 10 '23 08:08 bauerfe

This is a very good point, thanks a lot for the detailed explanation. So what are the use cases that should be covered?

  1. Binning time windows needs 3 out of 4: bin size/time window/dt, number of bins, start time, end time. Question: what to do if the events do not all start at a given time step, for example if they've been sliced from a longer recording into smaller segments previously? Then a single start/end time parameter doesn't hold anymore for all slices. One option would be to have the user subtract multiples of the slicing time beforehand, then a common start time (of zero) is valid again. Example: One recording of 1s length is sliced into 4 chunks of 250ms. In the 4 chunks, the first event timestamps are: 3, 259, 520, 750, so potentially bigger than the actual slicing time, as you describe in the issue. Then, one would need to define a transform before ToFrame that subtracts (events['t']//250ms) * 250ms from each recording and then we can set the start time to 0. That would make the first timestamps 3, 9, 20, 0 in the example given.
  2. Binning by number of events: just needs n_events
  3. Binning a fixed number of bins that are equally distributed over time. This can be used when we know that the data recordings are all roughly the same length. It's a convenience method of sorts and could be calculated by providing start_time, end_time and n_bins in the new version.
  4. Currently we also support binning by a fixed number of bins that are equally split across number of events. I haven't seen anyone using it though, I think we should drop it.

biphasic avatar Aug 10 '23 09:08 biphasic

we could re-use this transform for it by providing a slice_time parameter, it currently only subtracts the first timestamp, which aimed to solve a related problem https://tonic.readthedocs.io/en/latest/generated/tonic.transforms.TimeAlignment.html#tonic.transforms.TimeAlignment

biphasic avatar Aug 10 '23 09:08 biphasic

Or we are even cleverer and provide and option to make SliceByTime subtract the slice_time during slicing itself. It needs to be optional though, this might not always be desired

biphasic avatar Aug 10 '23 09:08 biphasic

we could have a reset_timestamps parameter in here https://tonic.readthedocs.io/en/latest/_modules/tonic/slicers.html#SliceByTime

biphasic avatar Aug 10 '23 09:08 biphasic

  1. Binning time windows needs 3 out of 4: bin size/time window/dt, number of bins, start time, end time. Question: what to do if the events do not all start at a given time step, for example if they've been sliced from a longer recording into smaller segments previously? Then a single start/end time parameter doesn't hold anymore for all slices. One option would be to have the user subtract multiples of the slicing time beforehand, then a common start time (of zero) is valid again. Example: One recording of 1s length is sliced into 4 chunks of 250ms. In the 4 chunks, the first event timestamps are: 3, 259, 520, 750, so potentially bigger than the actual slicing time, as you describe in the issue. Then, one would need to define a transform before ToFrame that subtracts (events['t']//250ms) * 250ms from each recording and then we can set the start time to 0. That would make the first timestamps 3, 9, 20, 0 in the example given.

Good point. I've recently come across this issue. I think the solution you suggest would help here.

bauerfe avatar Aug 10 '23 09:08 bauerfe