matnwb
matnwb copied to clipboard
failing validation of matnwb-generated files with extension in cached namespace
pynwb validation of matnwb-generated files fails when an extension is in the cached namespace. This is and independent issue from issue #349 and is not fixed by PR #350
To replicate:
generateExtension('/Users/cesar/Repositories/ndx-ecog/spec/ndx-ecog.namespace.yaml');
nwb = NwbFile( ...
'session_description', 'mouse in open exploration', ...
'identifier', 'Mouse5_Day3', ...
'session_start_time', datetime(2018, 4, 25, 2, 30, 3) ...
);
cortical_surfaces = types.ndx_ecog.CorticalSurfaces;
parcellations = types.ndx_ecog.Parcellations();
parcellation = types.ndx_ecog.Parcellation( ...
'data', randi(5, 1, 20), ...
'labels', {'a','b', 'c', 'd', 'e'} ...
);
parcellations.parcellation.set('my_map', parcellation);
nwbExport(nwb, 'ndx-ecog_file_test_matnwb.nwb');
$ python -m pynwb.validate ndx-ecog_file_test_matnwb.nwb
Validating ndx-ecog_file_test_matnwb.nwb against cached namespace information using namespace 'ndx-ecog'.
Traceback (most recent call last):
File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/validate/validator.py", line 229, in get_validator
return self.__validators[dt]
KeyError: 'NWBFile'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/pynwb/validate.py", line 136, in <module>
main()
File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/pynwb/validate.py", line 130, in main
ret = ret or _validate_helper(io=io, namespace=ns)
File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/pynwb/validate.py", line 23, in _validate_helper
errors = validate(**kwargs)
File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/utils.py", line 587, in func_call
return func(**pargs)
File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/pynwb/__init__.py", line 198, in validate
return validator.validate(builder)
File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/utils.py", line 583, in func_call
return func(args[0], **pargs)
File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/validate/validator.py", line 247, in validate
validator = self.get_validator(dt)
File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/utils.py", line 583, in func_call
return func(args[0], **pargs)
File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/validate/validator.py", line 232, in get_validator
raise ValueError(msg)
ValueError: data type 'NWBFile' not found in namespace ndx-ecog
No issue when cached namespace is not used:
$ python -m pynwb.validate ndx-ecog_file_test_matnwb.nwb --no-cached-namespace
Validating ndx-ecog_file_test_matnwb.nwb against pynwb namespace information using namespace 'core'.
- no errors found.
Doing some printing. It looks like pynwb is trying to use key 'NWBFile' to access an entry in a dictionary that has the relevant subset of classes as keys (e.g., Data
, NWBContainer
, Subject
). This is an incorrect behavior.
When successfully validating a file without an extension in the cached namespace, the 'NWBFile' keys is used to a access an entry in a dictionary with all core and hdmf-common classes as keys (e.g., NWBFile
, Subject
, DynamicTable
)
Issue seems to be in how matnwb caches the namespace of the extension. Compare how the namespace cached by matnwb:
import h5py
import json
from pprint import pprint
f = h5py.File('ndx-ecog_file_test_matnwb.nwb', "r")
pprint(json.loads(f["/specifications/ndx-ecog/0.3.1/namespace"][()]))
{'namespaces': [{'author': ['Ben Dichter'],
'contact': ['[email protected]'],
'doc': 'An NWB extension for storing the cortical surface of '
'subjects in ECoG experiments.',
'name': 'ndx-ecog',
'schema': [{'namespace': 'core',
'neurodata_types': ['NWBDataInterface',
'Subject',
'Images',
'NWBData']},
{'source': 'ndx-ecog.extensions'}],
'version': '0.3.1'}]}
with how the namespace is cached by pynwb
f = h5py.File('ndx-ecog_file_test_pynwb.nwb', "r")
pprint(json.loads(f["/specifications/ndx-ecog/0.3.1/namespace"][()]))
{'namespaces': [{'author': ['Ben Dichter'],
'contact': ['[email protected]'],
'doc': 'An NWB extension for storing the cortical surface of '
'subjects in ECoG experiments.',
'name': 'ndx-ecog',
'schema': [{'namespace': 'core'},
{'source': 'ndx-ecog.extensions'}],
'version': '0.3.1'}]}
Note that pynwb does not cache the neurodata_types
key in schema
. Modifying matnwb caching to exclude this key resolves the issue. Opening a PR with this change.
@rly can you clarify why caching the neurodata_types key is a problem and why PyNWB does not cache it?
This may be a bug in the PyNWB/HDMF validator. I thought this was resolved in https://github.com/hdmf-dev/hdmf/pull/613 . @cechava can you share that test file that you created?
Separately, PyNWB/HDMF should cache the neurodata_types
key. That is a bug.
Sure. The code to generate both files is below. They both make sure of the ndx-ecog extension
python
from pynwb import load_namespaces, get_class, NWBHDF5IO, NWBFile
from datetime import datetime
import numpy as np
from ndx_ecog import CorticalSurfaces, ECoGSubject, Surface, Parcellations
nwbfile = NWBFile('my first synthetic recording', 'EXAMPLE_ID', datetime.now())
cortex_module = nwbfile.create_processing_module(name='cortex',
description='description')
with NWBHDF5IO('ndx-ecog_file_test_pynwb.nwb', 'w') as io:
io.write(nwbfile)
MATLAB
generateExtension('/Users/cesar/Repositories/ndx-ecog/spec/ndx-ecog.namespace.yaml');
nwb = NwbFile( ...
'session_description', 'mouse in open exploration', ...
'identifier', 'Mouse5_Day3', ...
'session_start_time', datetime(2018, 4, 25, 2, 30, 3) ...
);
cortical_surfaces = types.ndx_ecog.CorticalSurfaces;
parcellations = types.ndx_ecog.Parcellations();
parcellation = types.ndx_ecog.Parcellation( ...
'data', randi(5, 1, 20), ...
'labels', {'a','b', 'c', 'd', 'e'} ...
);
parcellations.parcellation.set('my_map', parcellation);
nwbExport(nwb, 'ndx-ecog_file_test_matnwb.nwb');
@rly Just realized you were asking for the actual files. To make life easy here are links to download: matnwb-generated file pynwb-generated file