nwb-schema icon indicating copy to clipboard operation
nwb-schema copied to clipboard

Perceived(?) ambiguity w.r.t timestamps of SpikeEventSeries

Open ehennestad opened this issue 7 months ago • 16 comments

The documentation (doc key) for timestamps of SpikeEventSeries states that it should be a required property:

https://github.com/NeurodataWithoutBorders/nwb-schema/blob/0c7896b117dd1207a6bad737781d72c88a5421a5/core/nwb.ecephys.yaml#L97-L115

The definition of timestamps in TimeSeries is here:

https://github.com/NeurodataWithoutBorders/nwb-schema/blob/0c7896b117dd1207a6bad737781d72c88a5421a5/core/nwb.base.yaml#L197-L214

My understanding of inheritance in the specification is the following: If a dataset/attribute/link or a key of these are not defined in a sub-type, it inherits the value from its parent, or the nearest ancestor that defines the basic data type / key.

But that does not seem to be the case here. The only difference from the specification of timestamps in the SpikeEventSeries and in the TimeSeries is that the quantity is missing in SpikeEventSeries/timestamps. When the quantity is missing, the default value should be used, which in this case is 1 (required).

However, if the quantity should be determined based on inheritance, the SpikeEventSeries/timestamps dataset should be optional. I have tried to find an explanation in the NWB Specification Language documentation for which interpretation to use, but have not found anything.

In my understanding, the SpikeEventSeries/timestamps should look like this:

  - name: timestamps
    quantity: 1

basically just redefining the quantity, but not re-specifying all the unchanged specification keys

ehennestad avatar May 08 '25 08:05 ehennestad

I think the attributes of children should follow the parents over the defaults. If you want to override the parent, you must specify this explicitly.

bendichter avatar May 08 '25 14:05 bendichter

I agree with @bendichter . That feels intuitive to me.

However, HDMF was not designed to resolve the spec that way. When HDMF encounters a dataset that is defined with the same name ("timestamps") as a dataset defined in the parent type, then

  1. the dtypes are roughly checked for compatibility
  2. the new dataset spec completely replaces the parent dataset spec
  3. attributes from the parent dataset spec that are not redefined in the new dataset spec are added to the new dataset spec

So currently in HDMF, SpikeEventSeries.timestamps does not inherit the quantity: "?" from TimeSeries.timestamps. It instead adopts the default quantity: 1. Same for all other properties (dtype, shape, default name, value default value).

Do we want to continue or change this behavior?

The schema language doc does not explicitly describe the rules of inheritance.

rly avatar May 08 '25 15:05 rly

The PatchClampSeries/data and CurrentClampSeries/data seems to contradict what you outlined @rly :

PatchClampSeries/data:

https://github.com/NeurodataWithoutBorders/nwb-schema/blob/0c7896b117dd1207a6bad737781d72c88a5421a5/core/nwb.icephys.yaml#L14-L27

CurrentClampSeries/data https://github.com/NeurodataWithoutBorders/nwb-schema/blob/0c7896b117dd1207a6bad737781d72c88a5421a5/core/nwb.icephys.yaml#L44-L45

At least, in pynwb the CurrentClampSeries/data appears to inherit all the specification keys from the PatchClampSeries/data even though here the dataset is defined again, but with only the doc key, so all other keys should fall back to default values?

ehennestad avatar May 08 '25 18:05 ehennestad

Unfortunately, that is another oversight.

import pynwb
from pprint import pprint

pprint(pynwb.get_type_map().namespace_catalog.get_namespace("core").get_spec("CurrentClampSeries").get_dataset("data"))

returns

{'attributes': [{'doc': 'Base unit of measurement for working with the data. '
                        "which is fixed to 'volts'. Actual stored values are "
                        'not necessarily stored in these units. To access the '
                        "data in these units, multiply 'data' by 'conversion' "
                        "and add 'offset'.",
                 'dtype': 'text',
                 'name': 'unit',
                 'value': 'volts'},
                {'default_value': 1.0,
                 'doc': 'Scalar to multiply each element in data to convert it '
                        "to the specified 'unit'. If the data are stored in "
                        'acquisition system units or other units that require '
                        'a conversion to be interpretable, multiply the data '
                        "by 'conversion' to convert the data to the specified "
                        "'unit'. e.g. if the data acquisition system stores "
                        'values in this object as signed 16-bit integers '
                        '(int16 range -32,768 to 32,767) that correspond to a '
                        '5V range (-2.5V to 2.5V), and the data acquisition '
                        "system gain is 8000X, then the 'conversion' "
                        'multiplier to get from raw data acquisition values to '
                        'recorded volts is 2.5/32768/8000 = 9.5367e-9.',
                 'dtype': 'float32',
                 'name': 'conversion',
                 'required': False},
                {'default_value': 0.0,
                 'doc': 'Scalar to add to the data after scaling by '
                        "'conversion' to finalize its coercion to the "
                        "specified 'unit'. Two common examples of this include "
                        '(a) data stored in an unsigned type that requires a '
                        'shift after scaling to re-center the data, and (b) '
                        'specialized recording devices that naturally cause a '
                        'scalar offset with respect to the true units.',
                 'dtype': 'float32',
                 'name': 'offset',
                 'required': False},
                {'default_value': -1.0,
                 'doc': 'Smallest meaningful difference between values in '
                        'data, stored in the specified by unit, e.g., the '
                        'change in value of the least significant bit, or a '
                        'larger number if signal noise is known to be present. '
                        'If unknown, use -1.0.',
                 'dtype': 'float32',
                 'name': 'resolution',
                 'required': False},
                {'doc': 'Optionally describe the continuity of the data. Can '
                        'be "continuous", "instantaneous", or "step". For '
                        'example, a voltage trace would be "continuous", '
                        'because samples are recorded from a continuous '
                        'process. An array of lick times would be '
                        '"instantaneous", because the data represents distinct '
                        'moments in time. Times of image presentations would '
                        'be "step" because the picture remains the same until '
                        'the next timepoint. This field is optional, but is '
                        'useful in providing information about the underlying '
                        'data. It may inform the way this data is interpreted, '
                        'the way it is visualized, and what analysis methods '
                        'are applicable.',
                 'dtype': 'text',
                 'name': 'continuity',
                 'required': False}],
 'doc': 'Recorded voltage.',
 'name': 'data'}

dtype, dims, and shape from PatchClampSeries.data are not inherited...

rly avatar May 08 '25 18:05 rly

I was looking at this line and assumed it would inherit the data specifications https://github.com/NeurodataWithoutBorders/pynwb/blob/f956fbb30fdf361a9ae43a71fa509218012adb46/src/pynwb/icephys.py#L187

But I am still confused, Is it the correct then that all the default dataset specification keys apply to CurrentClampSeries/data, and that the value is supposed to be a scalar?

ehennestad avatar May 08 '25 18:05 ehennestad

I am looking at this:

The default behavior for shape is:

... shape: null indicating that the attribute/dataset is a scalar.

ehennestad avatar May 08 '25 18:05 ehennestad

The schema language docs needs to be corrected to our new understanding of the default behavior for shape. Our intention that is consistent with the schema and how HDMF works is that the default behavior for shape is any shape. This change in interpretation means that there are quite a few places in the schema where a scalar was intended, e.g., every description field, that needs an additional constraint to say that it is a scalar and not any shape. But it makes inheritance and weird cases like the above work without invalidating old data.

See https://github.com/hdmf-dev/hdmf-common-schema/issues/83

rly avatar May 08 '25 19:05 rly

I was looking at this line and assumed it would inherit the data specifications NeurodataWithoutBorders/pynwb@f956fbb/src/pynwb/icephys.py#L187

Correct. PyNWB makes CurrentClampSeries.data inherit from PatchClampSeries.data including any dtype and shape validation. These impose stronger constraints than how PyNWB/HDMF handles the spec after inheritance. A side effect of this is that the PyNWB/HDMF validator will say a 2D CurrentClampSeries.data is valid, but PyNWB does not allow you to create that.

But I am still confused, Is it the correct then that all the default dataset specification keys apply to CurrentClampSeries/data, and that the value is supposed to be a scalar?

We need to decide which behavior should be correct here, with the new understanding that the default value for shape should be any shape.

rly avatar May 08 '25 19:05 rly

Is the intention that the current specification for CurrentClampSeries.data should allow any data type and any shape?

ehennestad avatar May 08 '25 19:05 ehennestad

Sorry for all the back and forth, but I am a bit confused 😕

ehennestad avatar May 08 '25 19:05 ehennestad

Sorry for the confusion.

I think CurrentClampSeries.data as written should automatically inherit the numeric dtype, 1D shape, required-ness, and other properties from PatchClampSeries.data because it does not explicitly override them. We should make this change in HDMF. This change will likely affect other parts of the schema. We'll have to review all affected schema before releasing this.

Similarly, I think SpikeEventSeries.timestamps as written should inherit the quantity: "?" from TimeSeries.timestamps. Since it should be required in SpikeEventSeries (at least based on the doc), then we need to add quantity: 1 into the SpikeEventSeries schema.

@oruebel what do you think? I think you and Andrew set up the initial design of inheritance.

rly avatar May 12 '25 06:05 rly

In discussion with @oruebel:

Yes, child types that override a dataset or attribute of the parent type should inherit the dtype, shape, and other properties from the corresponding dataset or attribute of the parent type unless the child type redefines (overrides) them. We should change the behavior of HDMF to reflect this. I will create an issue.

@oruebel also suggested making all cases of overriding explicit in the NWB schema to prevent confusion. So even though CurrentClampSeries.data should inherit numeric dtype, 1D shape, etc. from PatchClampSeries.data, we should just specify those in CurrentClampSeries.data. This may introduce a maintenance burden if PatchClampSeries.data and X.data for all child types of PatchClampSeries need to be changed, but that is OK with us.

rly avatar May 16 '25 00:05 rly

You can track progress on the HDMF issue here: https://github.com/hdmf-dev/hdmf/issues/320

MatNWB should also follow the inheritance logic described above.

rly avatar May 16 '25 00:05 rly

@oruebel also suggested making all cases of overriding explicit in the NWB schema to prevent confusion. So even though CurrentClampSeries.data should inherit numeric dtype, 1D shape, etc. from PatchClampSeries.data, we should just specify those in CurrentClampSeries.data. This may introduce a maintenance burden if PatchClampSeries.data and X.data for all child types of PatchClampSeries need to be changed, but that is OK with us.

@rly @oruebel Let me see if I understand this:

  1. PatchClampSeries.data overrides some specification keys from the TimeSeries.data, i.e dtype, dims and shape. This is explicitly stated in the PatchClampSeries.data specification.
  2. CurrentClampSeries.data inherits from PatchClampSeries.data and there are no overrides. However, since the CurrentClampSeries.data is different from the TimeSeries.data the overrides of dtype, dims and shape should be repeated in the specification of CurrentClampSeries.data?

ehennestad avatar May 25 '25 07:05 ehennestad

  1. PatchClampSeries.data overrides some specification keys from the TimeSeries.data, i.e dtype, dims and shape. This is explicitly stated in the PatchClampSeries.data specification.

Correct. PatchClampSeries.data inherits all specification keys that are not explicitly overridden.

  1. CurrentClampSeries.data inherits from PatchClampSeries.data and there are no overrides. However, since the CurrentClampSeries.data is different from the TimeSeries.data the overrides of dtype, dims and shape should be repeated in the specification of CurrentClampSeries.data?

Yes. We will make that change to the schema to make it extra clear to a reader, but the behavior should be the same.

rly avatar May 27 '25 19:05 rly

@rly @oruebel

I would argue against introducing ad-hoc rules like this for the sake of clarity:

  1. It increases maintenance burden, as already noted.
  2. It undermines the value of having a schema language with inheritance—if we need to manually repeat inherited properties, we lose the benefits of abstraction and reuse.
  3. It may actually reduce clarity. If the schema and documentation diverge (e.g., inheritance is assumed (and not stated) in one place, but explicit (and stated) in another), it can become more confusing rather than less.

Instead, I’d suggest relying on the schema's inheritance behavior, but instead ensure that the documentation makes these inherited relationships visible and understandable to users.

ehennestad avatar May 30 '25 08:05 ehennestad