matnwb icon indicating copy to clipboard operation
matnwb copied to clipboard

Add mixin class for data types with un-named (sub)group properties, e.g. ProcessingModule

Open ehennestad opened this issue 7 months ago • 8 comments

Motivation

fix #625

How to test the behavior?

processingModule = types.core.ProcessingModule();
processingModule.add('MyTimeSeries', types.core.TimeSeries);
processingModule.MyTimeSeries
ans = 

  TimeSeries with properties:

     starting_time_unit: 'seconds'
    timestamps_interval: 1
        timestamps_unit: 'seconds'
                   data: []
              data_unit: ''
               comments: 'no comments'
                control: []
    control_description: ''
        data_continuity: ''
        data_conversion: 1
            data_offset: 0
        data_resolution: -1
            description: 'no description'
          starting_time: []
     starting_time_rate: []
             timestamps: []

Todos:

  • [ ] 1. warning on read if name aliases are used
  • [x] 2. .get syntax
  • [x] 3. name mapping should be publicly accessible
  • [ ] 4. how-to-guide

Cases to write unit tests for:

Edge cases mentioned in issue

Adding data types to a container type

  • [x] Data type is added to a container, i.e aProcessingModule.add('namedDataType')
    • [x] Is the data type added to the underlying types.untyped.Set?
  • [x] Is the data type added as a dynamic property to the container type (i.e a ProcessingModule) on nwbRead?

Removing data types from a container type

  • [x] Test that data object is removed from underlying subgroup (Set object) when using remove on the parent data object
  • [x] Test that data object is removed from the parent data object if the object is removed from the subgroup (Set object)

Checklist

  • [x] Have you ensured the PR description clearly describes the problem and solutions?
  • [x] Have you checked to ensure that there aren't other open or previously closed Pull Requests for the same change?
  • [x] If this PR fixes an issue, is the first line of the PR description fix #XX where XX is the issue number?

ehennestad avatar May 01 '25 08:05 ehennestad

Codecov Report

:x: Patch coverage is 95.53073% with 24 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 95.70%. Comparing base (eeab63c) to head (0ba9d7a).

Files with missing lines Patch % Lines
+matnwb/+utility/DynamicPropertyManager.m 85.96% 8 Missing :warning:
+matnwb/+utility/NameRegistry.m 82.97% 8 Missing :warning:
+matnwb/+mixin/HasUnnamedGroups.m 97.04% 5 Missing :warning:
+types/+untyped/Set.m 98.42% 3 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #705      +/-   ##
==========================================
+ Coverage   95.59%   95.70%   +0.11%     
==========================================
  Files         184      189       +5     
  Lines        6444     6869     +425     
==========================================
+ Hits         6160     6574     +414     
- Misses        284      295      +11     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar May 01 '25 19:05 codecov[bot]

This is currently possible in MatNWB, but creates an invalid group in the NWB-file where two different data type objects are merged in the same group:

nwbFile = tests.factory.NWBFile();
module = types.core.ProcessingModule();

nwbFile.processing.set('test', module);

T = struct2table(struct('name', {'a', 'b'}, 'value', {1, 2}));
dynamicTable = util.table2nwb(T);

nwbFile.processing.get('test').description = 'a test';
nwbFile.processing.get('test').nwbdatainterface.set('test', tests.factory.TimeSeriesWithTimestamps)
nwbFile.processing.get('test').dynamictable.set('test', dynamicTable)

nwbExport( nwbFile, 'test_same_name_in_different_subgroups.nwb')

Easy to prevent for this new syntax, but not for the legacy syntax.

ehennestad avatar May 03 '25 10:05 ehennestad

Do we need all of this renaming logic? What if someone tries to add two objects, Time-Series and Time_Series?

Why can't we just change module.datainterfaces.get("Foo") to module.get("Foo") or module("Foo")? Then we don't need to mess with any of this name changing stuff. We can expose module.Foo if the name allows it, and if not then revert to the other syntax.

bendichter avatar May 05 '25 14:05 bendichter

Do we need all of this renaming logic? What if someone tries to add two objects, Time-Series and Time_Series?

That will look like this:

module = types.core.ProcessingModule('description', 'a module with two timeseries');
module.add('time_series', types.core.TimeSeries)
module.add('time-series', types.core.TimeSeries)

>> module

module = 

Warning: The following named elements of "ProcessingModule" are remapped to have valid MATLAB names, but will be written to file with
their actual names:
       ValidName        ActualName
    _______________    _____________

    "time_series_1"    "time-series"
 
  ProcessingModule with properties:

      description: 'a module with two timeseries'
    time_series_1: [1×1 types.core.TimeSeries]
      time_series: [1×1 types.core.TimeSeries]

Which is actually quite similar to how MATLAB does it in e.g readtable.

given a test.csv file:

time-series	time_series
1	2
3	4
>> readtable('test.csv')
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the
table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names. 

ans =

  2×2 table

    time_series    time_series_1
    ___________    _____________

         1               2      
         3               4      

Why can't we just change module.datainterfaces.get("Foo") to module.get("Foo") or module("Foo")? Then we don't need to mess with any of this name changing stuff. We can expose module.Foo if the name allows it, and if not then revert to the other syntax.

First, I would like to avoid using module.get because its not a very common MATLAB syntax, and it's difficult for a user to know when to use module.get and when to use standard dot-indexing, i.e module.desription.

Implementing parenthesis indexing is quite straightforward with the matlab.mixin.indexing.RedefinesParen class, but it is only supported from R2021b. It is possible to override subsref for compatibility in older releases, but I would really not recommend doing that.

Also, one syntax for "valid" names and another syntax for other names I think would be confusing.

In summary, the renaming logic would provide full support for reading files from pynwb where names are specified with spaces or or other symbols that are not supported in MATLAB but also take care of edge cases like the one you used as an example, and in general be easier to use as there would only be one type of indexing syntax to remember

ehennestad avatar May 05 '25 21:05 ehennestad

You're not warning on the add command? I'm really not into magically renaming. Now the order in which the data elements are added to the file object matters and the user needs to learn the renaming rule. I think module.get('my-name') is so much simpler and sufficiently solves the problem without getting into tricky territory. I understand it's a bit unusual syntax but I've just been in too many situations where renaming causes more confusion than it is worth.

bendichter avatar May 05 '25 21:05 bendichter

You're not warning on the add command?

Good point, there could be an extra warning on the add command.

I'm really not into magically renaming

It's not really renaming, just creating an alias which is used for the dynamic properties that enables dot-indexing and autocompletion. module.nwbdatainterface.get('my-name') would still work and creating a shortcut as module.get('my-name') would be a quick addition.

How about keeping/providing both modes?

A typical MATLAB user would likely specify names using valid camelCase or snake_case and not encounter the name remapping at all. If they loaded a file created with python using different naming convention, their preferred modes might vary. For example, I prefer to explore new objects in the command window, in which the alias-names are presented and where I can use them for dot-indexing. If someone else prefers to use the "real" names they could do so with module.get('my-name')

ehennestad avatar May 05 '25 22:05 ehennestad

  • [ ] 1. warning on read if name aliases are used
  • [x] 2. .get syntax
  • [x] 3. name mapping should be publicly accessible
  • [ ] 4. how-to-guide

ehennestad avatar May 08 '25 14:05 ehennestad

This is the old syntax: nwb.processing.get('ophys').nwbdatainterface.get('Fluorescence').roiresponseseries.get('RoiResponseSeries') but what if you had an RoiResponseSeries named "roiresponseseries"? I think the only solution is in this case to require the syntax nwb.processing.get('ophys').nwbdatainterface.get('Fluorescence').roiresponseseries.get('roiresponseseries')

bendichter avatar Jun 20 '25 14:06 bendichter