pynwb icon indicating copy to clipboard operation
pynwb copied to clipboard

Generalize Validation

Open CodyCBakerPhD opened this issue 1 year ago • 14 comments

fix #1431 replaces #1494 by branching from dev instead of the modularity improvement

Motivation

WIP synchronizing the behavior of the CLI pynwb.validate function with the library-imported one, namely for the purposes of validating against cached namespaces (but also just to have identical behavior between library and CLI use cases).

How to test the behavior?

Need to...

  • [x] Thoroughly test that the CLI usage behaves as it used to
  • [x] The cached namespace feature is indeed available to the new library method as well
  • [x] Ensure there is no break in back-combability when using the previous methods

Checklist

  • [x] Did you update CHANGELOG.md with your changes?
  • [X] Have you checked our Contributing document?
  • [X] Have you ensured the PR clearly describes the problem and the solution?
  • [X] Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
  • [X] Have you checked to ensure that there aren't other open Pull Requests for the same change? - a splinter/extension of enh/getcachednamespaces
  • [X] Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

CodyCBakerPhD avatar Jul 18 '22 18:07 CodyCBakerPhD

Ah, looks like you are still working on this. Happy to help or review it when ready.

rly avatar Aug 11 '22 16:08 rly

@rly @oruebel This is ready to be looked at...

Main issue I'm having right now is getting the non-required argument parsing through docval to work.

Traceback (most recent call last):
  File "C:\Users\Raven\.conda\envs\pynwb\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Raven\.conda\envs\pynwb\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "d:\github\pynwb\src\pynwb\validate.py", line 223, in <module>
    validate_cli()
  File "d:\github\pynwb\src\pynwb\validate.py", line 213, in validate_cli
    validation_errors, validation_status = validate(
  File "C:\Users\Raven\.conda\envs\pynwb\lib\site-packages\hdmf\utils.py", line 648, in func_call
    pargs = _check_args(args, kwargs)
  File "C:\Users\Raven\.conda\envs\pynwb\lib\site-packages\hdmf\utils.py", line 637, in _check_args
    raise ExceptionType(msg)
TypeError: validate: incorrect type for 'namespace' (got 'NoneType', expected 'str or NoneType')

CodyCBakerPhD avatar Sep 01 '22 18:09 CodyCBakerPhD

but likewise if I remove the type: (..., type(None)) to mimic the pattern of https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/src/pynwb/image.py#L38-L43 I still get

Traceback (most recent call last):
  File "C:\Users\Raven\.conda\envs\pynwb\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Raven\.conda\envs\pynwb\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "d:\github\pynwb\src\pynwb\validate.py", line 223, in <module>
    validate_cli()
  File "d:\github\pynwb\src\pynwb\validate.py", line 213, in validate_cli
    validation_errors, validation_status = validate(
  File "C:\Users\Raven\.conda\envs\pynwb\lib\site-packages\hdmf\utils.py", line 648, in func_call
    pargs = _check_args(args, kwargs)
  File "C:\Users\Raven\.conda\envs\pynwb\lib\site-packages\hdmf\utils.py", line 637, in _check_args
    raise ExceptionType(msg)
TypeError: validate: incorrect type for 'namespace' (got 'NoneType', expected 'str')

CodyCBakerPhD avatar Sep 01 '22 18:09 CodyCBakerPhD

Main issue I'm having right now is getting the non-required argument parsing through docval to work.

Never mind, I just had to set the CLI parser default value to resolve this.

CodyCBakerPhD avatar Sep 02 '22 16:09 CodyCBakerPhD

@rly @oruebel Now I'm trying to debug these failing validation CLI tests, but when I locally run the tests it gives

   .... Massive stack trace....
               FileNotFoundError: [Errno 2] No such file or directory: 'coverage'

How do I obtain this 'coverage' folder with what I presume contains NWB files I need to test against?

CodyCBakerPhD avatar Sep 02 '22 17:09 CodyCBakerPhD

I fixed the tests. This looks good to me! Thanks, @CodyCBakerPhD ! Let me know if you think it is good to merge.

rly avatar Sep 15 '22 00:09 rly

@rly Thanks a bunch! I would maybe like to triple check that the API call is 100% compatible with the NWB Inspector (since that also optionally performs validation before the best practices check) which could also clean up a bit of code in the dandi-cli. I'll look into this early next week and let you know

CodyCBakerPhD avatar Sep 15 '22 02:09 CodyCBakerPhD

Had to make one small adjustment after testing a thorough integration with the NWBInspector (https://github.com/NeurodataWithoutBorders/nwbinspector/pull/265): https://github.com/NeurodataWithoutBorders/pynwb/pull/1511/commits/be4c65e5ff79dbefe937970a778bff21bb80dc66

When testing validation on a file that had an extension namespace cached within it, that extension itself extended core and so validation against cached namespaces would end up in the condition `namespace 'core' is contained in ', please validate against that instead.'

CodyCBakerPhD avatar Sep 20 '22 15:09 CodyCBakerPhD

@CodyCBakerPhD I was writing a test for the cached extension condition but I was confused about the last commit.

Currently with this branch, if I have a file with a cached extension that extends core, and I run:

python -m pynwb.validate tests/back_compat/2.1.0_nwbfile_with_extension.nwb --ns core

Then it will validate with the highest cached namespaces and not just "core". I think this is misleading. Previously, this would raise an error saying that "core" is included in namespace "x" and that we should validate against "x" instead.

However, I note that

python -m pynwb.validate tests/back_compat/2.1.0_nwbfile_with_extension.nwb --ns core
python -m pynwb.validate tests/back_compat/2.1.0_nwbfile_with_extension.nwb

are equivalent. I think the no argument condition should validate against all cached namespaces. So if there is an extension that extends "core", it would validate against that and not "core" which is included in the extension. This would be a breaking change.

What do you think?

rly avatar Sep 20 '22 22:09 rly

I think the no argument condition should validate against all cached namespaces. So if there is an extension that extends "core", it would validate against that and not "core" which is included in the extension. This would be a breaking change.

What do you think?

I agree, that does make the most sense for a default argument. Would you be OK with a change like that? I was trying to avoid changing anything major like that.

CodyCBakerPhD avatar Sep 21 '22 02:09 CodyCBakerPhD

It's a pretty major breaking change in the behavior of the validate function and CLI. For now, I suggest adding a new flag "--all" to validate against all cached namespaces, reverting your previous change, and using "--all" in the inspector. Then, in a separate PR we can prepare making "--all" the default and changing the name of the file to validation.py.

rly avatar Sep 21 '22 04:09 rly

@rly OK it was a false alert - the source of the issue was actually that as of a few commits ago I had unintentionally changed the default structure of the namespaces argument which was including the 'core' namespace even though the logic on dev handles this by only including 'core' if there is nothing else cached.

Reverting to mirror more of the behavior currently on dev resolved the issue without needing that specific skip logic for core.

As far as including a flag for all, I think I'll leave that for the future if it's decided that it's still desired/needed, currently I'd say the logic of the lines https://github.com/NeurodataWithoutBorders/pynwb/blob/generalize_validate_from_dev/src/pynwb/validate.py#L66-L68 is sufficient to account for overlapping namespaces that aren't 'core'.

CodyCBakerPhD avatar Sep 21 '22 14:09 CodyCBakerPhD

Also thanks for adding that test example, it perfectly replicates what I was seeing on my own test file with that structure.

I determined the 'correct' behavior for the cases - that is, the CLI works as it does on dev (returns the 'core' is contained in 'ndx-...' use that instead), using the API validate with paths+verbose=True does the same thing the CLI does (the new intended behavior for this PR) but the API validate with the previous io+namspace usage behaves as it used to for maintaining back-compatibility (it explicitly performs the validation against the specified namespace without complaining).

CodyCBakerPhD avatar Sep 21 '22 15:09 CodyCBakerPhD

Codecov Report

Merging #1511 (06ca152) into dev (6e1f131) will increase coverage by 0.68%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev    #1511      +/-   ##
==========================================
+ Coverage   90.65%   91.34%   +0.68%     
==========================================
  Files          25       25              
  Lines        2494     2507      +13     
  Branches      466      471       +5     
==========================================
+ Hits         2261     2290      +29     
+ Misses        148      137      -11     
+ Partials       85       80       -5     
Flag Coverage Δ
integration 70.56% <100.00%> (+0.95%) :arrow_up:
unit 84.44% <24.71%> (+0.28%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/pynwb/__init__.py 76.15% <100.00%> (-1.22%) :arrow_down:
src/pynwb/validate.py 100.00% <100.00%> (+21.05%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Sep 27 '22 22:09 codecov[bot]

Coverage on the new validate.py is now 100%. Woot!

rly avatar Oct 19 '22 00:10 rly

Thanks @CodyCBakerPhD and @rly for all the work on this PR to clean up validation!

oruebel avatar Oct 19 '22 00:10 oruebel