nwb-schema icon indicating copy to clipboard operation
nwb-schema copied to clipboard

new neurodata_type: TTL pulse

Open bendichter opened this issue 6 years ago • 16 comments

It would be nice to have an intuitive and machine-readable neurodata_type for TTL pulses. Something like

neurodata_type_def: TTL
neurodata_type_ind: TimeSeries
datasets:
  - name: data
    shape:
      - None
    dims:
      - pulses
    dtype: int
 - name: keys
   shape:
     - None
   dims:
     - unique_labels
   dtype: int
 - name: labels
   shape:
     - None
   dims:
     - unique_labels
   dtype: text

bendichter avatar Sep 09 '19 18:09 bendichter

@rly when you return I'd be interested in your thoughts

bendichter avatar Sep 09 '19 19:09 bendichter

Jeep, that could be useful. For what would you use the labels and keys?

t-b avatar Sep 09 '19 21:09 t-b

The labels would give you a human readable description of what the events were. The keys would map those labels to values

bendichter avatar Sep 10 '19 05:09 bendichter

@bendichter But you could use epochs for that as well or?

t-b avatar Sep 10 '19 13:09 t-b

Sounds like you’re describing a CV, which is usually represented with a table. I’d recommend just using a DynamicTable for the mapping. And the data could be stored as a DynamicTableRegion. I’m not sure how that would play with the TimeSeries.data tho

ajtritt avatar Sep 10 '19 15:09 ajtritt

what do you mean by CV?

bendichter avatar Sep 10 '19 21:09 bendichter

controlled vocabulary.

ajtritt avatar Sep 10 '19 21:09 ajtritt

Oh I see. Yeah very similar but I think the one difference is that the actual value of the int is important to the labs as well as the label.

bendichter avatar Sep 10 '19 21:09 bendichter

That might not be a CV then. Are the values of the pulses enumerable?

ajtritt avatar Sep 10 '19 22:09 ajtritt

Enumerable? The ints are chosen by the researcher to signal specific events. There is a finite number of them and they generally are not sequential.

bendichter avatar Sep 11 '19 08:09 bendichter

Although TTL pulses are a sequence of events in time, I do not think inheriting this type from TimeSeries is most appropriate here. TTL pulses are not generic time series measurements (the data don't have units) and the timestamps are almost never uniformly sampled in time. It would be more appropriate to use or alias the LabeledEvents type https://github.com/NeurodataWithoutBorders/nwb-schema/pull/302/files which is similar to what you have here ("keys" = "labels", "labels" = "label_map")

In my experience, users sometimes work with all of the TTL pulses together in order of time, but more often they separate the pulses by value and work with only a subset of them, often in isolation (peri-stimulus time histogram), and sometimes together in order. Those use cases would be best served by two different storage options:

  1. as @bendichter described, store a single array of N (timestamp, value) pairs
  2. store K 1-D arrays of timestamps, where K is the number of unique pulse values.

For Option 1, if the user wants to get the data in the form of Option 2 (i.e. split the array by value into K 1-D arrays), a naive implementation takes O(NK) time - I'm not sure this can be done faster.

For Option 2, if the user wants to get the data in the form of Option 1 (i.e. create a single array of all of the pulse values in order of time), then they would have to do a K-way merge sort of the K 1-D arrays, which is O(N log K) time.

Given that difference in efficiency (at the limit) and that I think working with only one or a subset of pulse values is more common than working with all pulse values in order, I think Option 2 would be better.

rly avatar Sep 12 '19 19:09 rly

  1. store K 1-D arrays of timestamps, where K is the number of unique pulse values.

Could we store this in the same way we store spike units?

ajtritt avatar Sep 12 '19 19:09 ajtritt

Yes, a DynamicTable would work, but for some reason, it feels less intuitive to me (same for the units table). It's a little overkill, but it would organize the TTL pulses together. Granted, the user does not need to know (and to some degree should not need to know) the low-level details of how the data is structured.

Ultimately, I think that TTL pulses, spike unit times, and events in general, could, and probably should, be stored in the same way as each other. So, yes, option 3:

  1. Store TTL pulses as a DynamicTable with indexed column pulse_times, and normal column label.

rly avatar Sep 12 '19 23:09 rly

I worry about the proliferating use of DynamicTables, because it sort of side-steps the whole point of NWB:N, which is to provide good standards for meta-data. DynamicTable is basically an open door that allows you to store any type of data you want with minimal constraints. Any open door will lead to the temptation to use it for every datatype that is not currently supported, and I worry about the long term effects of that. This is useful for purely-meta-data structures like trials, where this flexibility is important and adherence to a strict standard is less important, but I worry when they are used for annotating acquired data. When they are used to annotate data, DynamicTables on their own are invariably insufficient for the chosen type of data and we end up having to compensate with hacks. Even with Units there is an issue with storing the sampling rate of the waveforms and with linking individual spikes to snippets. The only way to resolve this flexibly would be to allow DTs to have global attributes, and at that point they would not enforce any structure at all!

I agree that we need a way to express time series objects that are not measurements and are unitless. I think the proposed Events and/or LabelledEvents addresses this problem nicely, and I like that it generalizes to cases outside of TTL pulses because this has come up a few times and will continue to be needed. The one issue I have with the proposed extension is the names of the datasets. To me, label should be a string, i.e. the value of the dictionary, not the key.

Then there is the question of whether we should store a one object that captures all TTL pulses or several objects that each store a TTL pulse event type. This comes down to whether we want to make the data closer to what is acquired or closer to the semantic meaning/how it will be used an analysis. In this case I vote to make it closer to acquired with LabelledEvents because I imagine TTL pulses going in /acquisition. If users want to pull out specific events, we could encourage them to do that with Events objects in a processing module.

bendichter avatar Sep 13 '19 08:09 bendichter

@bendichter Coming back to this, when you wrote:

When they are used to annotate data, DynamicTables on their own are invariably insufficient for the chosen type of data and we end up having to compensate with hacks. Even with Units there is an issue with storing the sampling rate of the waveforms and with linking individual spikes to snippets. The only way to resolve this flexibly would be to allow DTs to have global attributes, and at that point they would not enforce any structure at all!

What do you mean by global attributes? I think it is fine for subtypes of DynamicTables to define attributes that apply to all rows of the table.

rly avatar Oct 11 '19 01:10 rly

What is the issue with linking individual spikes to snippets? I saw a bunch of issues/PRs on UnitSeries and it looked like a reasonable solution was agreed upon https://github.com/NeurodataWithoutBorders/nwb-schema/issues/248

rly avatar Oct 11 '19 01:10 rly