Basic Backend Configuration
This PR attempts to show the backend configuration options for a single output NWB file.
Currently running into issues with the SpikeGLX-Phy test pipeline, where the following error is thrown when using an adjusted NeuroConv branch.
[2024-04-05 12:06:50,020] ERROR in app: Exception on /neuroconv/configuration [POST]
Traceback (most recent call last):
File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/apis/neuroconv.py", line 135, in post
return get_backend_configuration(neuroconv_api.payload)
File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 676, in get_backend_configuration
configuration = get_default_backend_configuration(nwbfile=nwbfile, backend=info["backend"])
File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_backend_configuration.py", line 21, in get_default_backend_configuration
return BackendConfigurationClass.from_nwbfile(nwbfile=nwbfile)
File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_backend.py", line 45, in from_nwbfile
dataset_configurations = {
File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_backend.py", line 45, in <dictcomp>
dataset_configurations = {
File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_dataset_configuration.py", line 127, in get_default_dataset_io_configurations
dataset_io_configuration = DatasetIOConfigurationClass.from_neurodata_object(
File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py", line 263, in from_neurodata_object
raise NotImplementedError(
NotImplementedError: Unable to create a `DatasetIOConfiguration` for the dataset at 'units/unit_name/data'for neurodata object '<hdmf.common.table.VectorData object at 0x286ee2f40>' of type '<class 'hdmf.common.table.VectorData'>'!
Are we expecting that the user's choice of backend (i.e. hdf5 vs zarr) will be consistent across files or should we support selection of different backends for individual files?
How should we handle the below inconsistency between stub and non-stub configurations?
[2024-04-09 17:08:56,487] ERROR in app: Exception on /neuroconv/configuration [POST]
Traceback (most recent call last):
File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/apis/neuroconv.py", line 137, in post
return get_backend_configuration(neuroconv_api.payload)
File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 760, in get_backend_configuration
configuration = create_file(info)
File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 746, in create_file
configure_dataset_backends(nwbfile, backend_configuration, configuration)
File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 693, in configure_dataset_backends
setattr(configuration.dataset_configurations[name], key, value)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/pydantic/main.py", line 792, in __setattr__
self.__pydantic_validator__.validate_assignment(self, name, value)
pydantic_core._pydantic_core.ValidationError: 1 validation error for HDF5DatasetIOConfiguration
Value error, Some dimensions of the chunk_shape=[788] exceed the buffer_shape=(159,) for dataset at location 'units/spike_times/data'! [type=value_error, input_value={'object_id': '1e57fe46-d...pression_options': None}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/value_error
Are we expecting that the user's choice of backend (i.e. hdf5 vs zarr) will be consistent across files or should we support selection of different backends for individual files?
Same as everything else - one file per session, let the session manager allow individual file specification, but allow a global default (it would generally be odd to have a mix of backends, but there might be a reason to allow it if there's a special backend-specific compression method they want to use for a data stream that is only available in some subset of sessions - but in general, no reason to force all sessions to be the same)
How should we handle the below inconsistency between stub and non-stub configurations?
Thinking more on it, there isn't much use to apply the backend configuration to a stub file. Neurosift ought to be agnostic to such things aside from slight performance gains
Gotcha. So we can move the configuration step to immediately before Conversion Review?
Gotcha. So we can move the configuration step to immediately before Conversion Review?
Thematically I think it still fits after metadata but before inspection/visualization - would be an odd break in momentum to go back to specifying file details after those steps
Running into this error when attempting to configure the SpikeGLX V1 SingleProbe AP test pipeline after selecting Zarr as the backend in the preform:
[2024-04-10 11:25:20,150] ERROR in app: Exception on /neuroconv/configuration [POST]
Traceback (most recent call last):
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 1287, in __list_fill__
dset[:] = data # If data is an h5py.Dataset then this will copy the data
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/zarr/core.py", line 1452, in __setitem__
self.set_basic_selection(pure_selection, value, fields=fields)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/zarr/core.py", line 1548, in set_basic_selection
return self._set_basic_selection_nd(selection, value, fields=fields)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/zarr/core.py", line 1938, in _set_basic_selection_nd
self._set_selection(indexer, value, fields=fields)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/zarr/core.py", line 1965, in _set_selection
check_array_shape("value", value, sel_shape)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/zarr/util.py", line 552, in check_array_shape
raise ValueError(
ValueError: parameter 'value': expected array with shape (1155, 384), got ()
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/apis/neuroconv.py", line 137, in post
return get_backend_configuration(neuroconv_api.payload)
File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 761, in get_backend_configuration
configuration = create_file(info)
File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 750, in create_file
return configuration
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/contextlib.py", line 126, in __exit__
next(self.gen)
File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_metadata_and_file_helpers.py", line 207, in make_or_load_nwbfile
io.write(nwbfile)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/utils.py", line 668, in func_call
return func(args[0], **pargs)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 270, in write
super(ZarrIO, self).write(**kwargs)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/utils.py", line 668, in func_call
return func(args[0], **pargs)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/backends/io.py", line 99, in write
self.write_builder(f_builder, **kwargs)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/utils.py", line 668, in func_call
return func(args[0], **pargs)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 433, in write_builder
self.write_group(
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/utils.py", line 668, in func_call
return func(args[0], **pargs)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 523, in write_group
self.write_group(
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/utils.py", line 668, in func_call
return func(args[0], **pargs)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 533, in write_group
self.write_dataset(
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/utils.py", line 668, in func_call
return func(args[0], **pargs)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 1108, in write_dataset
dset = self.__list_fill__(parent, name, data, options)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 1292, in __list_fill__
dset[i] = data[i]
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/data_utils.py", line 1085, in __getitem__
return self.data[item]
TypeError: 'SpikeInterfaceRecordingDataChunkIterator' object is not subscriptable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask/app.py", line 1484, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask/app.py", line 1469, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask_restx/api.py", line 404, in wrapper
resp = resource(*args, **kwargs)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask/views.py", line 109, in view
return current_app.ensure_sync(self.dispatch_request)(**kwargs)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask_restx/resource.py", line 46, in dispatch_request
resp = meth(*args, **kwargs)
File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/apis/neuroconv.py", line 141, in post
neuroconv_api.abort(500, str(exception))
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask_restx/namespace.py", line 153, in abort
abort(*args, **kwargs)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask_restx/errors.py", line 28, in abort
flask.abort(code)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask/helpers.py", line 277, in abort
current_app.aborter(code, *args, **kwargs)
File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/werkzeug/exceptions.py", line 861, in __call__
raise self.mapping[code](*args, **kwargs)
werkzeug.exceptions.InternalServerError: 500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
@CodyCBakerPhD I think I remember you saying that you could provide None on an axis for it to be automatically specified. Is this correct?
Swapped to only specify Chunk Shape but not sure how to specify Buffer Shape to handle that automatically.
@CodyCBakerPhD I think I remember you saying that you could provide None on an axis for it to be automatically specified. Is this correct?
Swapped to only specify Chunk Shape but not sure how to specify Buffer Shape to handle that automatically.
Not yet, I mentioned it as an eventual possibility but it won't make foreseeable deadlines so leave out the possibility for None (no check boxes for disabling an axis)
Chunk shape only is fine and reasonable; only exposing buffer 'size' (buffer_gb) will automatically fit as many chunks into RAM as it can up to that threshold
@garrettmflynn Could you rebase this one while I continue work on the Flask side?
@garrettmflynn OK, final merge conflicts and failing tests for the last major PR before release (or slightly post-release if too much work is needed here)
Okay this PR is at least functional now. To view the conversion progress, it seems that we need to expose conversion options to make_or_load_nwbfile, which is used when configuring the backend. Below is the relevant section in manage_neuroconv.py: https://github.com/NeurodataWithoutBorders/nwb-guide/blob/e54b0f58666a9a3909facf7887ed8d31bd4ff8a9/src/pyflask/manageNeuroconv/manage_neuroconv.py#L903-L908
Does this align with what you'd expect?
Okay this PR is at least functional now. To view the conversion progress, it seems that we need to expose conversion options to make_or_load_nwbfile, which is used when configuring the backend. Below is the relevant section in manage_neuroconv.py:
Thanks to recent efforts, this should not be needed; but it is also fine as an alternative strategy and refactoring could be done in a follow-up
I'll give this a try then
Any idea why tests are failing?
Tracking it down. Think I'm pretty close, we'll see how this commit is handled
Ah right. So now I can pass the configuration options as part of run_conversion directly? Should probably work that in here since this essentially nullifies #778 at the moment.
So now I can pass the configuration options as part of run_conversion directly?
That is correct
Similar to how you got time alignment info, there could be a get_ and set_ method for aggregating payload information across pages then passing it to the conversion call
Whichever is easier for you to design
Some stylistic requests from the first pass through (which felt pretty good once I got it working BTW)
a) Should probably suppress the 'Zarr' backend option for the time being (but perhaps leave commented so easy to re enable in future)
Reasons include
- DANDI technically doesn't officially take NWB Zarr files yet (anything on that front is still experimental)
- NeuroConv doesn't have full tests running with Zarr
I know that leaves a dropdown with only a single element, though, so if you'd rather remove the dropdown altogether that might also be OK (and just hardcode 'hdf5' in the payload for now)
b) When displaying summary information about the source array, such as
I think it would look better to separate the axis lengths by a space and an x, e.g., 1155 x 384
This is how HDFView does it as well
Also, on that same summary, can you use the human readable auto sizing on the source array size (so that one would be on MB scale)?
c) Possibly a bug though not sure on what level; if I do as the description says and leave the compression method blank, it still defaults to GZIP (also if I navigate back to this page after proceeding) so I was unable to disable compression
All of these should be fixed!
Just as context for the gzip option, it looks like I'm getting a schema with the default compression_method as gzip—so I've just had to remove this, as I'll set to the schema default.
Yeah, it's one of those things...
gzip is definitely the initial value that should fill into all the options 'by default', but blank is allowed to equivocate to the Python side of None, meaning no compression
Though I guess it might be even better to add an explicit option in the dropdown for 'No compression' - would you be able to inject that to the schema and map it to mean null/None when passing back to NeuroConv?
There's no inconsistency as far as I've seen, so the described behavior is what's happening now.
I can do this if you'd like, but not 100% sure what the time estimate would be on it—so we might be better off with it as a follow-up.
Sure, a follow-up for 'no compression' if we have time later
But for now you have a way to auto-populate gzip but have 'default' (if I remove text then click off) set to blank?
Yeah that's just what happens since there isn't a default or global value. gzip seems to just be passed as the compression method value for every configuration level.
Looks great!