nwb-guide icon indicating copy to clipboard operation
nwb-guide copied to clipboard

Basic Backend Configuration

Open garrettmflynn opened this issue 1 year ago • 8 comments

This PR attempts to show the backend configuration options for a single output NWB file.

Currently running into issues with the SpikeGLX-Phy test pipeline, where the following error is thrown when using an adjusted NeuroConv branch.

[2024-04-05 12:06:50,020] ERROR in app: Exception on /neuroconv/configuration [POST]
Traceback (most recent call last):
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/apis/neuroconv.py", line 135, in post
    return get_backend_configuration(neuroconv_api.payload)
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 676, in get_backend_configuration
    configuration = get_default_backend_configuration(nwbfile=nwbfile, backend=info["backend"])
  File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_backend_configuration.py", line 21, in get_default_backend_configuration
    return BackendConfigurationClass.from_nwbfile(nwbfile=nwbfile)
  File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_backend.py", line 45, in from_nwbfile
    dataset_configurations = {
  File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_backend.py", line 45, in <dictcomp>
    dataset_configurations = {
  File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_dataset_configuration.py", line 127, in get_default_dataset_io_configurations
    dataset_io_configuration = DatasetIOConfigurationClass.from_neurodata_object(
  File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py", line 263, in from_neurodata_object
    raise NotImplementedError(
NotImplementedError: Unable to create a `DatasetIOConfiguration` for the dataset at 'units/unit_name/data'for neurodata object '<hdmf.common.table.VectorData object at 0x286ee2f40>' of type '<class 'hdmf.common.table.VectorData'>'!

garrettmflynn avatar Apr 05 '24 23:04 garrettmflynn

Are we expecting that the user's choice of backend (i.e. hdf5 vs zarr) will be consistent across files or should we support selection of different backends for individual files?

garrettmflynn avatar Apr 09 '24 22:04 garrettmflynn

How should we handle the below inconsistency between stub and non-stub configurations?

[2024-04-09 17:08:56,487] ERROR in app: Exception on /neuroconv/configuration [POST]
Traceback (most recent call last):
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/apis/neuroconv.py", line 137, in post
    return get_backend_configuration(neuroconv_api.payload)
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 760, in get_backend_configuration
    configuration = create_file(info)
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 746, in create_file
    configure_dataset_backends(nwbfile, backend_configuration, configuration)
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 693, in configure_dataset_backends
    setattr(configuration.dataset_configurations[name], key, value)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/pydantic/main.py", line 792, in __setattr__
    self.__pydantic_validator__.validate_assignment(self, name, value)
pydantic_core._pydantic_core.ValidationError: 1 validation error for HDF5DatasetIOConfiguration
  Value error, Some dimensions of the chunk_shape=[788] exceed the buffer_shape=(159,) for dataset at location 'units/spike_times/data'! [type=value_error, input_value={'object_id': '1e57fe46-d...pression_options': None}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/value_error

garrettmflynn avatar Apr 10 '24 00:04 garrettmflynn

Are we expecting that the user's choice of backend (i.e. hdf5 vs zarr) will be consistent across files or should we support selection of different backends for individual files?

Same as everything else - one file per session, let the session manager allow individual file specification, but allow a global default (it would generally be odd to have a mix of backends, but there might be a reason to allow it if there's a special backend-specific compression method they want to use for a data stream that is only available in some subset of sessions - but in general, no reason to force all sessions to be the same)

CodyCBakerPhD avatar Apr 10 '24 00:04 CodyCBakerPhD

How should we handle the below inconsistency between stub and non-stub configurations?

Thinking more on it, there isn't much use to apply the backend configuration to a stub file. Neurosift ought to be agnostic to such things aside from slight performance gains

CodyCBakerPhD avatar Apr 10 '24 00:04 CodyCBakerPhD

Gotcha. So we can move the configuration step to immediately before Conversion Review?

garrettmflynn avatar Apr 10 '24 00:04 garrettmflynn

Gotcha. So we can move the configuration step to immediately before Conversion Review?

Thematically I think it still fits after metadata but before inspection/visualization - would be an odd break in momentum to go back to specifying file details after those steps

CodyCBakerPhD avatar Apr 10 '24 01:04 CodyCBakerPhD

Running into this error when attempting to configure the SpikeGLX V1 SingleProbe AP test pipeline after selecting Zarr as the backend in the preform:


[2024-04-10 11:25:20,150] ERROR in app: Exception on /neuroconv/configuration [POST]
Traceback (most recent call last):
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 1287, in __list_fill__
    dset[:] = data  # If data is an h5py.Dataset then this will copy the data
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/zarr/core.py", line 1452, in __setitem__
    self.set_basic_selection(pure_selection, value, fields=fields)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/zarr/core.py", line 1548, in set_basic_selection
    return self._set_basic_selection_nd(selection, value, fields=fields)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/zarr/core.py", line 1938, in _set_basic_selection_nd
    self._set_selection(indexer, value, fields=fields)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/zarr/core.py", line 1965, in _set_selection
    check_array_shape("value", value, sel_shape)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/zarr/util.py", line 552, in check_array_shape
    raise ValueError(
ValueError: parameter 'value': expected array with shape (1155, 384), got ()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/apis/neuroconv.py", line 137, in post
    return get_backend_configuration(neuroconv_api.payload)
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 761, in get_backend_configuration
    configuration = create_file(info)
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/manageNeuroconv/manage_neuroconv.py", line 750, in create_file
    return configuration
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/contextlib.py", line 126, in __exit__
    next(self.gen)
  File "/Users/garrettflynn/Documents/GitHub/neuroconv/src/neuroconv/tools/nwb_helpers/_metadata_and_file_helpers.py", line 207, in make_or_load_nwbfile
    io.write(nwbfile)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/utils.py", line 668, in func_call
    return func(args[0], **pargs)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 270, in write
    super(ZarrIO, self).write(**kwargs)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/utils.py", line 668, in func_call
    return func(args[0], **pargs)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/backends/io.py", line 99, in write
    self.write_builder(f_builder, **kwargs)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/utils.py", line 668, in func_call
    return func(args[0], **pargs)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 433, in write_builder
    self.write_group(
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/utils.py", line 668, in func_call
    return func(args[0], **pargs)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 523, in write_group
    self.write_group(
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/utils.py", line 668, in func_call
    return func(args[0], **pargs)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 533, in write_group
    self.write_dataset(
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/utils.py", line 668, in func_call
    return func(args[0], **pargs)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 1108, in write_dataset
    dset = self.__list_fill__(parent, name, data, options)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf_zarr/backend.py", line 1292, in __list_fill__
    dset[i] = data[i]
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/hdmf/data_utils.py", line 1085, in __getitem__
    return self.data[item]
TypeError: 'SpikeInterfaceRecordingDataChunkIterator' object is not subscriptable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask/app.py", line 1484, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask/app.py", line 1469, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask_restx/api.py", line 404, in wrapper
    resp = resource(*args, **kwargs)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask/views.py", line 109, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask_restx/resource.py", line 46, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/Users/garrettflynn/Documents/GitHub/nwb-guide/pyflask/apis/neuroconv.py", line 141, in post
    neuroconv_api.abort(500, str(exception))
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask_restx/namespace.py", line 153, in abort
    abort(*args, **kwargs)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask_restx/errors.py", line 28, in abort
    flask.abort(code)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/flask/helpers.py", line 277, in abort
    current_app.aborter(code, *args, **kwargs)
  File "/Users/garrettflynn/miniconda3/envs/nwb-guide/lib/python3.9/site-packages/werkzeug/exceptions.py", line 861, in __call__
    raise self.mapping[code](*args, **kwargs)
werkzeug.exceptions.InternalServerError: 500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

garrettmflynn avatar Apr 10 '24 18:04 garrettmflynn

@CodyCBakerPhD I think I remember you saying that you could provide None on an axis for it to be automatically specified. Is this correct?

Swapped to only specify Chunk Shape but not sure how to specify Buffer Shape to handle that automatically.

garrettmflynn avatar Apr 25 '24 22:04 garrettmflynn

@CodyCBakerPhD I think I remember you saying that you could provide None on an axis for it to be automatically specified. Is this correct?

Swapped to only specify Chunk Shape but not sure how to specify Buffer Shape to handle that automatically.

Not yet, I mentioned it as an eventual possibility but it won't make foreseeable deadlines so leave out the possibility for None (no check boxes for disabling an axis)

Chunk shape only is fine and reasonable; only exposing buffer 'size' (buffer_gb) will automatically fit as many chunks into RAM as it can up to that threshold

CodyCBakerPhD avatar May 30 '24 19:05 CodyCBakerPhD

@garrettmflynn Could you rebase this one while I continue work on the Flask side?

CodyCBakerPhD avatar May 30 '24 19:05 CodyCBakerPhD

@garrettmflynn OK, final merge conflicts and failing tests for the last major PR before release (or slightly post-release if too much work is needed here)

CodyCBakerPhD avatar Jun 02 '24 21:06 CodyCBakerPhD

Okay this PR is at least functional now. To view the conversion progress, it seems that we need to expose conversion options to make_or_load_nwbfile, which is used when configuring the backend. Below is the relevant section in manage_neuroconv.py: https://github.com/NeurodataWithoutBorders/nwb-guide/blob/e54b0f58666a9a3909facf7887ed8d31bd4ff8a9/src/pyflask/manageNeuroconv/manage_neuroconv.py#L903-L908

Does this align with what you'd expect?

garrettmflynn avatar Jun 03 '24 14:06 garrettmflynn

Okay this PR is at least functional now. To view the conversion progress, it seems that we need to expose conversion options to make_or_load_nwbfile, which is used when configuring the backend. Below is the relevant section in manage_neuroconv.py:

Thanks to recent efforts, this should not be needed; but it is also fine as an alternative strategy and refactoring could be done in a follow-up

I'll give this a try then

Any idea why tests are failing?

CodyCBakerPhD avatar Jun 03 '24 14:06 CodyCBakerPhD

Tracking it down. Think I'm pretty close, we'll see how this commit is handled

garrettmflynn avatar Jun 03 '24 15:06 garrettmflynn

Ah right. So now I can pass the configuration options as part of run_conversion directly? Should probably work that in here since this essentially nullifies #778 at the moment.

garrettmflynn avatar Jun 03 '24 16:06 garrettmflynn

So now I can pass the configuration options as part of run_conversion directly?

That is correct

Similar to how you got time alignment info, there could be a get_ and set_ method for aggregating payload information across pages then passing it to the conversion call

Whichever is easier for you to design

CodyCBakerPhD avatar Jun 03 '24 16:06 CodyCBakerPhD

Some stylistic requests from the first pass through (which felt pretty good once I got it working BTW)

a) Should probably suppress the 'Zarr' backend option for the time being (but perhaps leave commented so easy to re enable in future)

Reasons include

  • DANDI technically doesn't officially take NWB Zarr files yet (anything on that front is still experimental)
  • NeuroConv doesn't have full tests running with Zarr

I know that leaves a dropdown with only a single element, though, so if you'd rather remove the dropdown altogether that might also be OK (and just hardcode 'hdf5' in the payload for now)

b) When displaying summary information about the source array, such as

image

I think it would look better to separate the axis lengths by a space and an x, e.g., 1155 x 384

This is how HDFView does it as well

Also, on that same summary, can you use the human readable auto sizing on the source array size (so that one would be on MB scale)?

c) Possibly a bug though not sure on what level; if I do as the description says and leave the compression method blank, it still defaults to GZIP (also if I navigate back to this page after proceeding) so I was unable to disable compression

CodyCBakerPhD avatar Jun 04 '24 14:06 CodyCBakerPhD

All of these should be fixed!

Just as context for the gzip option, it looks like I'm getting a schema with the default compression_method as gzip—so I've just had to remove this, as I'll set to the schema default.

garrettmflynn avatar Jun 04 '24 15:06 garrettmflynn

Yeah, it's one of those things...

gzip is definitely the initial value that should fill into all the options 'by default', but blank is allowed to equivocate to the Python side of None, meaning no compression

Though I guess it might be even better to add an explicit option in the dropdown for 'No compression' - would you be able to inject that to the schema and map it to mean null/None when passing back to NeuroConv?

CodyCBakerPhD avatar Jun 04 '24 16:06 CodyCBakerPhD

There's no inconsistency as far as I've seen, so the described behavior is what's happening now.

I can do this if you'd like, but not 100% sure what the time estimate would be on it—so we might be better off with it as a follow-up.

garrettmflynn avatar Jun 04 '24 16:06 garrettmflynn

Sure, a follow-up for 'no compression' if we have time later

But for now you have a way to auto-populate gzip but have 'default' (if I remove text then click off) set to blank?

CodyCBakerPhD avatar Jun 04 '24 16:06 CodyCBakerPhD

Yeah that's just what happens since there isn't a default or global value. gzip seems to just be passed as the compression method value for every configuration level.

garrettmflynn avatar Jun 04 '24 16:06 garrettmflynn

Looks great!

CodyCBakerPhD avatar Jun 05 '24 16:06 CodyCBakerPhD