matnwb
matnwb copied to clipboard
Add mixin class for data types with un-named (sub)group properties, e.g. ProcessingModule
Motivation
fix #625
How to test the behavior?
processingModule = types.core.ProcessingModule();
processingModule.add('MyTimeSeries', types.core.TimeSeries);
processingModule.MyTimeSeries
ans =
TimeSeries with properties:
starting_time_unit: 'seconds'
timestamps_interval: 1
timestamps_unit: 'seconds'
data: []
data_unit: ''
comments: 'no comments'
control: []
control_description: ''
data_continuity: ''
data_conversion: 1
data_offset: 0
data_resolution: -1
description: 'no description'
starting_time: []
starting_time_rate: []
timestamps: []
Todos:
- [ ] 1. warning on read if name aliases are used
- [x] 2.
.getsyntax - [x] 3. name mapping should be publicly accessible
- [ ] 4. how-to-guide
Cases to write unit tests for:
Edge cases mentioned in issue
Adding data types to a container type
- [x] Data type is added to a container, i.e
aProcessingModule.add('namedDataType')- [x] Is the data type added to the underlying types.untyped.Set?
- [x] Is the data type added as a dynamic property to the container type (i.e a ProcessingModule) on
nwbRead?
Removing data types from a container type
- [x] Test that data object is removed from underlying subgroup (
Setobject) when using remove on the parent data object - [x] Test that data object is removed from the parent data object if the object is removed from the subgroup (
Setobject)
Checklist
- [x] Have you ensured the PR description clearly describes the problem and solutions?
- [x] Have you checked to ensure that there aren't other open or previously closed Pull Requests for the same change?
- [x] If this PR fixes an issue, is the first line of the PR description
fix #XXwhereXXis the issue number?
Codecov Report
:x: Patch coverage is 95.53073% with 24 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 95.70%. Comparing base (eeab63c) to head (0ba9d7a).
Additional details and impacted files
@@ Coverage Diff @@
## main #705 +/- ##
==========================================
+ Coverage 95.59% 95.70% +0.11%
==========================================
Files 184 189 +5
Lines 6444 6869 +425
==========================================
+ Hits 6160 6574 +414
- Misses 284 295 +11
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
This is currently possible in MatNWB, but creates an invalid group in the NWB-file where two different data type objects are merged in the same group:
nwbFile = tests.factory.NWBFile();
module = types.core.ProcessingModule();
nwbFile.processing.set('test', module);
T = struct2table(struct('name', {'a', 'b'}, 'value', {1, 2}));
dynamicTable = util.table2nwb(T);
nwbFile.processing.get('test').description = 'a test';
nwbFile.processing.get('test').nwbdatainterface.set('test', tests.factory.TimeSeriesWithTimestamps)
nwbFile.processing.get('test').dynamictable.set('test', dynamicTable)
nwbExport( nwbFile, 'test_same_name_in_different_subgroups.nwb')
Easy to prevent for this new syntax, but not for the legacy syntax.
Do we need all of this renaming logic? What if someone tries to add two objects, Time-Series and Time_Series?
Why can't we just change module.datainterfaces.get("Foo") to module.get("Foo") or module("Foo")? Then we don't need to mess with any of this name changing stuff. We can expose module.Foo if the name allows it, and if not then revert to the other syntax.
Do we need all of this renaming logic? What if someone tries to add two objects, Time-Series and Time_Series?
That will look like this:
module = types.core.ProcessingModule('description', 'a module with two timeseries');
module.add('time_series', types.core.TimeSeries)
module.add('time-series', types.core.TimeSeries)
>> module
module =
Warning: The following named elements of "ProcessingModule" are remapped to have valid MATLAB names, but will be written to file with
their actual names:
ValidName ActualName
_______________ _____________
"time_series_1" "time-series"
ProcessingModule with properties:
description: 'a module with two timeseries'
time_series_1: [1×1 types.core.TimeSeries]
time_series: [1×1 types.core.TimeSeries]
Which is actually quite similar to how MATLAB does it in e.g readtable.
given a test.csv file:
time-series time_series
1 2
3 4
>> readtable('test.csv')
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the
table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
ans =
2×2 table
time_series time_series_1
___________ _____________
1 2
3 4
Why can't we just change module.datainterfaces.get("Foo") to module.get("Foo") or module("Foo")? Then we don't need to mess with any of this name changing stuff. We can expose module.Foo if the name allows it, and if not then revert to the other syntax.
First, I would like to avoid using module.get because its not a very common MATLAB syntax, and it's difficult for a user to know when to use module.get and when to use standard dot-indexing, i.e module.desription.
Implementing parenthesis indexing is quite straightforward with the matlab.mixin.indexing.RedefinesParen class, but it is only supported from R2021b. It is possible to override subsref for compatibility in older releases, but I would really not recommend doing that.
Also, one syntax for "valid" names and another syntax for other names I think would be confusing.
In summary, the renaming logic would provide full support for reading files from pynwb where names are specified with spaces or or other symbols that are not supported in MATLAB but also take care of edge cases like the one you used as an example, and in general be easier to use as there would only be one type of indexing syntax to remember
You're not warning on the add command? I'm really not into magically renaming. Now the order in which the data elements are added to the file object matters and the user needs to learn the renaming rule. I think module.get('my-name') is so much simpler and sufficiently solves the problem without getting into tricky territory. I understand it's a bit unusual syntax but I've just been in too many situations where renaming causes more confusion than it is worth.
You're not warning on the add command?
Good point, there could be an extra warning on the add command.
I'm really not into magically renaming
It's not really renaming, just creating an alias which is used for the dynamic properties that enables dot-indexing and autocompletion. module.nwbdatainterface.get('my-name') would still work and creating a shortcut as module.get('my-name') would be a quick addition.
How about keeping/providing both modes?
A typical MATLAB user would likely specify names using valid camelCase or snake_case and not encounter the name remapping at all. If they loaded a file created with python using different naming convention, their preferred modes might vary. For example, I prefer to explore new objects in the command window, in which the alias-names are presented and where I can use them for dot-indexing. If someone else prefers to use the "real" names they could do so with module.get('my-name')
- [ ] 1. warning on read if name aliases are used
- [x] 2.
.getsyntax - [x] 3. name mapping should be publicly accessible
- [ ] 4. how-to-guide
This is the old syntax: nwb.processing.get('ophys').nwbdatainterface.get('Fluorescence').roiresponseseries.get('RoiResponseSeries') but what if you had an RoiResponseSeries named "roiresponseseries"? I think the only solution is in this case to require the syntax nwb.processing.get('ophys').nwbdatainterface.get('Fluorescence').roiresponseseries.get('roiresponseseries')