Perceived(?) ambiguity w.r.t timestamps of SpikeEventSeries
The documentation (doc key) for timestamps of SpikeEventSeries states that it should be a required property:
https://github.com/NeurodataWithoutBorders/nwb-schema/blob/0c7896b117dd1207a6bad737781d72c88a5421a5/core/nwb.ecephys.yaml#L97-L115
The definition of timestamps in TimeSeries is here:
https://github.com/NeurodataWithoutBorders/nwb-schema/blob/0c7896b117dd1207a6bad737781d72c88a5421a5/core/nwb.base.yaml#L197-L214
My understanding of inheritance in the specification is the following: If a dataset/attribute/link or a key of these are not defined in a sub-type, it inherits the value from its parent, or the nearest ancestor that defines the basic data type / key.
But that does not seem to be the case here. The only difference from the specification of timestamps in the SpikeEventSeries and in the TimeSeries is that the quantity is missing in SpikeEventSeries/timestamps. When the quantity is missing, the default value should be used, which in this case is 1 (required).
However, if the quantity should be determined based on inheritance, the SpikeEventSeries/timestamps dataset should be optional. I have tried to find an explanation in the NWB Specification Language documentation for which interpretation to use, but have not found anything.
In my understanding, the SpikeEventSeries/timestamps should look like this:
- name: timestamps
quantity: 1
basically just redefining the quantity, but not re-specifying all the unchanged specification keys
I think the attributes of children should follow the parents over the defaults. If you want to override the parent, you must specify this explicitly.
I agree with @bendichter . That feels intuitive to me.
However, HDMF was not designed to resolve the spec that way. When HDMF encounters a dataset that is defined with the same name ("timestamps") as a dataset defined in the parent type, then
- the dtypes are roughly checked for compatibility
- the new dataset spec completely replaces the parent dataset spec
- attributes from the parent dataset spec that are not redefined in the new dataset spec are added to the new dataset spec
So currently in HDMF, SpikeEventSeries.timestamps does not inherit the quantity: "?" from TimeSeries.timestamps. It instead adopts the default quantity: 1. Same for all other properties (dtype, shape, default name, value default value).
Do we want to continue or change this behavior?
The schema language doc does not explicitly describe the rules of inheritance.
The PatchClampSeries/data and CurrentClampSeries/data seems to contradict what you outlined @rly :
PatchClampSeries/data:
https://github.com/NeurodataWithoutBorders/nwb-schema/blob/0c7896b117dd1207a6bad737781d72c88a5421a5/core/nwb.icephys.yaml#L14-L27
CurrentClampSeries/data
https://github.com/NeurodataWithoutBorders/nwb-schema/blob/0c7896b117dd1207a6bad737781d72c88a5421a5/core/nwb.icephys.yaml#L44-L45
At least, in pynwb the CurrentClampSeries/data appears to inherit all the specification keys from the PatchClampSeries/data even though here the dataset is defined again, but with only the doc key, so all other keys should fall back to default values?
Unfortunately, that is another oversight.
import pynwb
from pprint import pprint
pprint(pynwb.get_type_map().namespace_catalog.get_namespace("core").get_spec("CurrentClampSeries").get_dataset("data"))
returns
{'attributes': [{'doc': 'Base unit of measurement for working with the data. '
"which is fixed to 'volts'. Actual stored values are "
'not necessarily stored in these units. To access the '
"data in these units, multiply 'data' by 'conversion' "
"and add 'offset'.",
'dtype': 'text',
'name': 'unit',
'value': 'volts'},
{'default_value': 1.0,
'doc': 'Scalar to multiply each element in data to convert it '
"to the specified 'unit'. If the data are stored in "
'acquisition system units or other units that require '
'a conversion to be interpretable, multiply the data '
"by 'conversion' to convert the data to the specified "
"'unit'. e.g. if the data acquisition system stores "
'values in this object as signed 16-bit integers '
'(int16 range -32,768 to 32,767) that correspond to a '
'5V range (-2.5V to 2.5V), and the data acquisition '
"system gain is 8000X, then the 'conversion' "
'multiplier to get from raw data acquisition values to '
'recorded volts is 2.5/32768/8000 = 9.5367e-9.',
'dtype': 'float32',
'name': 'conversion',
'required': False},
{'default_value': 0.0,
'doc': 'Scalar to add to the data after scaling by '
"'conversion' to finalize its coercion to the "
"specified 'unit'. Two common examples of this include "
'(a) data stored in an unsigned type that requires a '
'shift after scaling to re-center the data, and (b) '
'specialized recording devices that naturally cause a '
'scalar offset with respect to the true units.',
'dtype': 'float32',
'name': 'offset',
'required': False},
{'default_value': -1.0,
'doc': 'Smallest meaningful difference between values in '
'data, stored in the specified by unit, e.g., the '
'change in value of the least significant bit, or a '
'larger number if signal noise is known to be present. '
'If unknown, use -1.0.',
'dtype': 'float32',
'name': 'resolution',
'required': False},
{'doc': 'Optionally describe the continuity of the data. Can '
'be "continuous", "instantaneous", or "step". For '
'example, a voltage trace would be "continuous", '
'because samples are recorded from a continuous '
'process. An array of lick times would be '
'"instantaneous", because the data represents distinct '
'moments in time. Times of image presentations would '
'be "step" because the picture remains the same until '
'the next timepoint. This field is optional, but is '
'useful in providing information about the underlying '
'data. It may inform the way this data is interpreted, '
'the way it is visualized, and what analysis methods '
'are applicable.',
'dtype': 'text',
'name': 'continuity',
'required': False}],
'doc': 'Recorded voltage.',
'name': 'data'}
dtype, dims, and shape from PatchClampSeries.data are not inherited...
I was looking at this line and assumed it would inherit the data specifications https://github.com/NeurodataWithoutBorders/pynwb/blob/f956fbb30fdf361a9ae43a71fa509218012adb46/src/pynwb/icephys.py#L187
But I am still confused, Is it the correct then that all the default dataset specification keys apply to CurrentClampSeries/data, and that the value is supposed to be a scalar?
I am looking at this:
The default behavior for shape is:
... shape: null indicating that the attribute/dataset is a scalar.
The schema language docs needs to be corrected to our new understanding of the default behavior for shape. Our intention that is consistent with the schema and how HDMF works is that the default behavior for shape is any shape. This change in interpretation means that there are quite a few places in the schema where a scalar was intended, e.g., every description field, that needs an additional constraint to say that it is a scalar and not any shape. But it makes inheritance and weird cases like the above work without invalidating old data.
See https://github.com/hdmf-dev/hdmf-common-schema/issues/83
I was looking at this line and assumed it would inherit the data specifications NeurodataWithoutBorders/pynwb@
f956fbb/src/pynwb/icephys.py#L187
Correct. PyNWB makes CurrentClampSeries.data inherit from PatchClampSeries.data including any dtype and shape validation. These impose stronger constraints than how PyNWB/HDMF handles the spec after inheritance. A side effect of this is that the PyNWB/HDMF validator will say a 2D CurrentClampSeries.data is valid, but PyNWB does not allow you to create that.
But I am still confused, Is it the correct then that all the default dataset specification keys apply to
CurrentClampSeries/data, and that the value is supposed to be a scalar?
We need to decide which behavior should be correct here, with the new understanding that the default value for shape should be any shape.
Is the intention that the current specification for CurrentClampSeries.data should allow any data type and any shape?
Sorry for all the back and forth, but I am a bit confused 😕
Sorry for the confusion.
I think CurrentClampSeries.data as written should automatically inherit the numeric dtype, 1D shape, required-ness, and other properties from PatchClampSeries.data because it does not explicitly override them. We should make this change in HDMF. This change will likely affect other parts of the schema. We'll have to review all affected schema before releasing this.
Similarly, I think SpikeEventSeries.timestamps as written should inherit the quantity: "?" from TimeSeries.timestamps. Since it should be required in SpikeEventSeries (at least based on the doc), then we need to add quantity: 1 into the SpikeEventSeries schema.
@oruebel what do you think? I think you and Andrew set up the initial design of inheritance.
In discussion with @oruebel:
Yes, child types that override a dataset or attribute of the parent type should inherit the dtype, shape, and other properties from the corresponding dataset or attribute of the parent type unless the child type redefines (overrides) them. We should change the behavior of HDMF to reflect this. I will create an issue.
@oruebel also suggested making all cases of overriding explicit in the NWB schema to prevent confusion. So even though CurrentClampSeries.data should inherit numeric dtype, 1D shape, etc. from PatchClampSeries.data, we should just specify those in CurrentClampSeries.data. This may introduce a maintenance burden if PatchClampSeries.data and X.data for all child types of PatchClampSeries need to be changed, but that is OK with us.
You can track progress on the HDMF issue here: https://github.com/hdmf-dev/hdmf/issues/320
MatNWB should also follow the inheritance logic described above.
@oruebel also suggested making all cases of overriding explicit in the NWB schema to prevent confusion. So even though CurrentClampSeries.data should inherit numeric dtype, 1D shape, etc. from PatchClampSeries.data, we should just specify those in CurrentClampSeries.data. This may introduce a maintenance burden if PatchClampSeries.data and X.data for all child types of PatchClampSeries need to be changed, but that is OK with us.
@rly @oruebel Let me see if I understand this:
PatchClampSeries.dataoverrides some specification keys from theTimeSeries.data, i.edtype,dimsandshape. This is explicitly stated in thePatchClampSeries.dataspecification.CurrentClampSeries.datainherits fromPatchClampSeries.dataand there are no overrides. However, since theCurrentClampSeries.datais different from theTimeSeries.datathe overrides ofdtype,dimsandshapeshould be repeated in the specification ofCurrentClampSeries.data?
PatchClampSeries.dataoverrides some specification keys from theTimeSeries.data, i.edtype,dimsandshape. This is explicitly stated in thePatchClampSeries.dataspecification.
Correct. PatchClampSeries.data inherits all specification keys that are not explicitly overridden.
CurrentClampSeries.datainherits fromPatchClampSeries.dataand there are no overrides. However, since theCurrentClampSeries.datais different from theTimeSeries.datathe overrides ofdtype,dimsandshapeshould be repeated in the specification ofCurrentClampSeries.data?
Yes. We will make that change to the schema to make it extra clear to a reader, but the behavior should be the same.
@rly @oruebel
I would argue against introducing ad-hoc rules like this for the sake of clarity:
- It increases maintenance burden, as already noted.
- It undermines the value of having a schema language with inheritance—if we need to manually repeat inherited properties, we lose the benefits of abstraction and reuse.
- It may actually reduce clarity. If the schema and documentation diverge (e.g., inheritance is assumed (and not stated) in one place, but explicit (and stated) in another), it can become more confusing rather than less.
Instead, I’d suggest relying on the schema's inheritance behavior, but instead ensure that the documentation makes these inherited relationships visible and understandable to users.