VectorData for TimeSeries `data`
I am trying to make a NWBGroupSpec that would extend TimeSeries, the main thing being it would accept indexed data with VectorData and VectorIndex types. Here’s the repo for reference.
I would like it to override the data field, so I do:
PointCloudSeries = NWBGroupSpec(
doc='type for storing time-varying 3D point clouds',
neurodata_type_def='PointCloudSeries',
neurodata_type_inc='TimeSeries',
)
PointCloudSeries.add_dataset(
name='data',
neurodata_type_inc='VectorData',
doc='datapoints locations over time',
dims=('time', '[x, y, z]'),
shape=(None, 3),
dtype='float',
quantity='?'
)
The new group can be imported but when when I try to set:
from datetime import datetime
from pynwb import NWBFile
from ndx_pointcloudseries import PointCloudSeries
from hdmf.common.table import VectorIndex, VectorData
nwb = NWBFile('session_description', 'identifier', datetime.now().astimezone())
data = [[1., 1., 1.], [2., 2., 2.], [1., 2., 1.]]
data_vect = VectorData(name='data', description='desc', data=data)
indexes = [2, 3]
data_ind = VectorIndex(name='data_index', data=indexes, target=data_vect)
pcs = PointCloudSeries(
name='PointCloudSeries',
data=data_vect,
data_index=data_ind,
rate=10.
)
I get:
TypeError Traceback (most recent call last)
<ipython-input-4-4b65ab8f601c> in <module>
16 data=data_vect,
17 data_index=data_ind,
---> 18 rate=10.
19 )
20 nwb.add_acquisition(pcs)
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
457 if parse_err:
458 msg = ', '.join(parse_err)
--> 459 raise_from(ExceptionType(msg), None)
460
461 return func(self, **parsed['args'])
~\AppData\Roaming\Python\Python37\site-packages\six.py in raise_from(value, from_value)
TypeError: incorrect type for 'data' (got 'VectorData', expected 'ndarray, list, tuple, Dataset, HDMFDataset, AbstractDataChunkIterator, DataIO or TimeSeries')
The number of datapoints change in time, so it needs to be indexed, and we also wanted to leverage the methods for time slicing that would come with TimeSeries. So, I have some questions:
- Is it possible to override existing fields when inheriting from an existing group?
- If it is possible, would time slicing methods for
TimeSerieswork with indexed data? - Or else what would you suggest me to do? Maybe use a
DynamicTable?
Thanks!
Checklist
- [x] Have you ensured the feature or change was not already reported?
- [x] Have you included a brief and descriptive title?
- [x] Have you included a clear description of the problem you are trying to solve?
- [x] Have you included a minimal code snippet that reproduces the issue you are encountering?
- [x] Have you checked our Contributing document?
Or else what would you suggest me to do? Maybe use a
DynamicTable?
As a first test, what I would suggest to try is to add VectorData as an allowed type for the data attribute of the constructor of TimeSeries here:
https://github.com/NeurodataWithoutBorders/pynwb/blob/eeef0eb8ff4119b1a69bd172db1e98827299b278/src/pynwb/base.py#L104
If this works, then this would at least tell us that the read/write can work in principle. I'm not sure about other functionality of TimeSeries, but let's take this issue step-by-step.
Is it possible to override existing fields when inheriting from an existing group?
It is possible to refine the spec of existing fields but not overwrite. For example, it is possible to change the dtype on a dataset but you can't change the neurodata_type (only reuse or create new neurodata_types). In this particular case you are adding a neurodata_type to TimeSeries.data , which did not have type before. This is a corner-case that I don't think we have encountered before. I'm not sure if this allowed or not. @ajtritt do you know?
If it is possible, would time slicing methods for
TimeSerieswork with indexed data?
I'm not sure this would work right out of the box. E.g., in your case you set data to the VectorData object but you would actually need to slice against the VectorIndex dataset for time slicing. I would imaging, that at you would probably need to the TimeSeries.data to the VectorIndex and make sure that this is handled in the ObjectMapper.
thanks for the explanation @oruebel !
I added VectorData as an allowed type as suggested on line 104 and now I can construct the PointCloudSeries object:
pcs = PointCloudSeries(
name='PointCloudSeries',
data=data_vect,
data_index=data_ind,
rate=10.
)
nwb.add_acquisition(pcs)
print(nwb.acquisition['PointCloudSeries'])
gives:
PointCloudSeries abc.PointCloudSeries at 0x1661569459016
Fields:
comments: no comments
conversion: 1.0
data: data <class 'hdmf.common.table.VectorData'>
data_index: data_index <class 'hdmf.common.table.VectorIndex'>
description: no description
rate: 10.0
resolution: -1.0
starting_time: 0.0
now the error happens when trying to write to file:
with NWBHDF5IO('test.nwb', 'w') as io:
io.write(nwb)
gives:
C:\Users\Luiz\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\build\map.py:1041: OrphanContainerWarning: 'data' (VectorData) for 'PointCloudSeries' (PointCloudSeries)
warnings.warn(msg, OrphanContainerWarning)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\hdf5\h5tools.py in __add_refs(self)
528 try:
--> 529 call()
530 except KeyError:
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\hdf5\h5tools.py in _filler()
646 def _filler():
--> 647 obj.attrs[key] = self.__get_ref(value)
648 return _filler
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
460
--> 461 return func(self, **parsed['args'])
462 else:
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\hdf5\h5tools.py in __get_ref(self, **kwargs)
1076 else:
-> 1077 return self.__file[path].ref
1078
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\h5py\_hl\group.py in __getitem__(self, name)
263 else:
--> 264 oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
265
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
h5py\h5o.pyx in h5py.h5o.open()
KeyError: "Unable to open object (object 'data' doesn't exist)"
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
<ipython-input-2-a411a132fe37> in <module>
23 # Write nwb file
24 with NWBHDF5IO('test_pointcloudseries.nwb', 'w') as io:
---> 25 io.write(nwb)
26
27 ## Read nwb file and check its content
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
459 raise_from(ExceptionType(msg), None)
460
--> 461 return func(self, **parsed['args'])
462 else:
463 def func_call(*args, **kwargs):
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\hdf5\h5tools.py in write(self, **kwargs)
267
268 cache_spec = popargs('cache_spec', kwargs)
--> 269 call_docval_func(super(HDF5IO, self).write, kwargs)
270 if cache_spec:
271 ref = self.__file.attrs.get(SPEC_LOC_ATTR)
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\utils.py in call_docval_func(func, kwargs)
348 def call_docval_func(func, kwargs):
349 fargs, fkwargs = fmt_docval_args(func, kwargs)
--> 350 return func(*fargs, **fkwargs)
351
352
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
459 raise_from(ExceptionType(msg), None)
460
--> 461 return func(self, **parsed['args'])
462 else:
463 def func_call(*args, **kwargs):
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\io.py in write(self, **kwargs)
42 container = popargs('container', kwargs)
43 f_builder = self.__manager.build(container, source=self.__source)
---> 44 self.write_builder(f_builder, **kwargs)
45
46 @abstractmethod
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\utils.py in func_call(*args, **kwargs)
459 raise_from(ExceptionType(msg), None)
460
--> 461 return func(self, **parsed['args'])
462 else:
463 def func_call(*args, **kwargs):
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\hdf5\h5tools.py in write_builder(self, **kwargs)
511 self.write_link(self.__file, lbldr)
512 self.set_attributes(self.__file, f_builder.attributes)
--> 513 self.__add_refs()
514 self.__exhaust_dcis()
515
~\Anaconda3\envs\nwbn_conversion\lib\site-packages\hdmf\backends\hdf5\h5tools.py in __add_refs(self)
530 except KeyError:
531 if id(call) in failed:
--> 532 raise RuntimeError('Unable to resolve reference')
533 failed.add(id(call))
534 self.__ref_queue.append(call)
RuntimeError: Unable to resolve reference
Any ideas?
update
the data in error KeyError: "Unable to open object (object 'data' doesn't exist)" is the name given to the VectorData object. If I change it to e.g. data_name, the error becomes: KeyError: "Unable to open object (object 'data_name' doesn't exist)". I couldn't figure out anything beyond that though =/
I'm wondering whether there may be an issue with the ObjectMapping here, but I have not had the chance to dig deeper. @ajtritt do you have any idea?
I'm wondering whether there may be an issue with the ObjectMapping here
@oruebel you are correct. ObjectMapping is choking on having a Container passed as a concrete dataset.
The only way around this would be to change the TimeSeries.data spec to be a VectorData. I'm open to such a change, but it would probably be disruptive, so we should discuss that further.
Going the route of defining PointCloudSeries as a DynamicTable wold probably be easier though.