bids-validator icon indicating copy to clipboard operation
bids-validator copied to clipboard

[WIP][ENH] Validate data sets following BEP 029 Motion

Open sjeung opened this issue 2 years ago • 14 comments

Adding BEP 029 for motion data to validator. Work in progress

sjeung avatar Mar 22 '22 17:03 sjeung

Hi @rwblair, so far I have been trying to follow the document on contributing to BIDS validator and changed the .json schema and regexps for file names. Do you have any pointers regarding how to proceed from here? Unfortunately I have no background in javascript and I could not test the changes properly. Changing the schema files does not seem to be reflected on validation directly. Any type of help will be much appreciated. Thanks in advance!

sjeung avatar Mar 22 '22 17:03 sjeung

@sjeung you were very close, couple of naming issues and the filename regex started working. For the json, the schema had to be called in bids-validator/validators/json/json.js, I went ahead and added that.

For testing I used bids-standard/bids-examples#285 Were you able to use npm to install all the dependencies for the project? If those are good to go, the quickest way to call the validator with your changes is by calling bids-validator/bin/bids-validator from the command line.

rwblair avatar Mar 22 '22 20:03 rwblair

Codecov Report

Patch coverage: 94.59% and project coverage change: +0.10 :tada:

Comparison is base (f895b78) 83.17% compared to head (47d68e3) 83.28%.

:exclamation: Current head 47d68e3 differs from pull request most recent head 547c4b1. Consider uploading reports for the commit 547c4b1 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1434      +/-   ##
==========================================
+ Coverage   83.17%   83.28%   +0.10%     
==========================================
  Files          91       92       +1     
  Lines        3763     3799      +36     
  Branches     1158     1171      +13     
==========================================
+ Hits         3130     3164      +34     
- Misses        535      537       +2     
  Partials       98       98              
Impacted Files Coverage Δ
bids-validator/validators/tsv/checkTypeCol.js 100.00% <ø> (ø)
bids-validator/utils/type.js 88.38% <84.61%> (-0.21%) :arrow_down:
bids-validator/validators/json/json.js 100.00% <100.00%> (ø)
...s-validator/validators/tsv/checkMotionComponent.js 100.00% <100.00%> (ø)
bids-validator/validators/tsv/tsv.js 94.11% <100.00%> (+0.25%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Mar 22 '22 20:03 codecov[bot]

Hello @rwblair! Thank you so much for the touch-up work. I found some missing fields in the validator and fixed issues with the example data along with reflecting changes in the spec itself. Now the new version of "motion_spotrotation" data set in the same PR will pass the validator with minor warnings (undefined columns in motion.tsv, inter-subject mismatch, missing readme). The other two motion example data sets need updating, and we also will remove larger files that prevent the data from going on main repo.

I think the only modality specific issue at the moment is the headers in motion.tsv files. These are not metadata files and contains column names which correspond to channel names. The validator returns a warning because they are not defined in any other metadata file (but match entries in channels.tsv).

Ideally the validator will make motion data an exception and not display this warning. Do you have any recommendation on whether we should implement this fix?

sjeung avatar May 17 '22 09:05 sjeung

Hello @rwblair, I'd appreciate some help from you here, because I am stuck at the task of testing my validator version. With the older version that I used to run back in March I am getting dependency related error messages. I tried to merge your recent changes but the merge conflict seems quite complicated to resolve.

I have considered just copying the upstream master branch and modify the json schema and regexp again, but I don't want to undo the changes you have made specifically for motion branch.

Could you suggest me a solution? Or it will be wonderful if you have time for a short zoom call to instruct me how to go about this. Thanks :)

sjeung avatar Jul 14 '22 10:07 sjeung

@sjeung I went ahead and did the rebase with master, I can churn through them pretty quick (even though I made a couple mistakes).

I'll review the validator and examples PR to see where exactly we are and what else we might want to do and how to fix the tests.

rwblair avatar Jul 14 '22 16:07 rwblair

@sjeung I went ahead and did the rebase with master, I can churn through them pretty quick (even though I made a couple mistakes).

I'll review the validator and examples PR to see where exactly we are and what else we might want to do and how to fix the tests.

Thanks a lot, to give you a heads-up, the warnings about headers in motion.tsv can be ignored now, because we decided to remove headers in motion.tsv data files (https://github.com/bids-standard/bids-specification/issues/1134).

The regexp for filenames is not up-to-date, because I was not able to properly test. The change to be implemented is about allowing "tracksys-" key-value pair in file names of motion.json and channels.tsv. I can implement this once you are done.

Let me know if there is any further unclarity regarding data format.

sjeung avatar Jul 15 '22 14:07 sjeung

Hello @rwblair, hope you are doing well! Are there any questions about the motion spec? There is an additional motion + EEG example data that is being reviewed. Me and @JuliusWelzel are still discussing the headers in data file, and we are now again inclined to including them in motion.tsv files. But, no matter what we decide for, the priority is to have file names and .json files properly checked, I think.

sjeung avatar Aug 07 '22 20:08 sjeung

@sjeung Sorry for dropping this! I've reread the current specification PR and updated the validator as best I could. Few things I noticed:

  • ~~ExternalSoftwareVersions is present in the validator but not the specification~~ I was wrong about this one.
  • TrackingSystemCount was required in validator, I removed it from the required list. It is not listed in the specification.
  • SamplingFrequencyEffective is listed as required in validator but recommended in specification.
  • The list of acceptable type values in the channels.tsv does not cover some of the types listed in the dual task example. I added the type column check to the validator and the values listed in spec to the list of acceptable type column values. We can skip this check if enumerating them won't be helpful.
  • I added any field that was missing in the validator but listed in the spec. Also wrote enumerations for RotationRule and RotationOrder.
  • Updated StartTime to allow n/a since dual task had n/a's for some starttimes.

My todos:

  • [ ] Make new generic error for missing tsv columns. Currently throwing error 131 which is unrelated (just happened to be the error code for the section of code I copied)
  • [ ] I think it would make sense to add the tracksys entity and label to session and top level file regex, if there are no objections I'll do that.

Can you try running this against the motion examples and let me know what you think.

rwblair avatar Aug 08 '22 18:08 rwblair

Hey @rwblair,

thanks for the work! I just tried running the validator locally and got the following error:

../node_modules/bids-validator/validators/json/schemas/motion.json Module parse failed: Unexpected token a in JSON at position 2059 while parsing near '...tartTime": { anyOf: [ {"...' You may need an appropriate loader to handle this file type, currently no loaders are configured to process this file. See https://webpack.js.org/concepts#loaders SyntaxError: Unexpected token a in JSON at position 2059 while parsing near '...tartTime": { anyOf: [ {"...' at JSON.parse ()

Probably a quick fix for you?

Regarding the todos: I am +1 to add the tracksy entity to the top level file regex!

JuliusWelzel avatar Aug 09 '22 14:08 JuliusWelzel

gah! Sorry about that, should always test again no matter how small the change is. Should be fixed in latest commit.

rwblair avatar Aug 09 '22 14:08 rwblair

gah! Sorry about that, should always test again no matter how small the change is. Should be fixed in latest commit.

Issue fixed - thanks!

JuliusWelzel avatar Aug 09 '22 14:08 JuliusWelzel

Thanks a lot @rwblair , looks great.

To reply to your previous comment -

TrackingSystemCount was required in validator, I removed it from the required list. It is not listed in the specification.

  • Right, it is not supposed to be in the validator anymore.

SamplingFrequencyEffective is listed as required in validator but recommended in specification.

  • Also following the spec it should be made "recommended"

The list of acceptable type values in the channels.tsv does not cover some of the types listed in the dual task example. I added the type column check to the validator and the values listed in spec to the list of acceptable type column values. We can skip this check if enumerating them won't be helpful.

  • We do have a defined list of channel types and this was a mistake in the example data set. I will update the example so that the confidence channel is of type "MISC" instead.

I added any field that was missing in the validator but listed in the spec. Also wrote enumerations for RotationRule and RotationOrder.

  • Thank you!

Updated StartTime to allow n/a since dual task had n/a's for some starttimes.

  • From my side it seems fine, but is there any general recommendation for situations like this? The field itself is only "recommended" and it is possible for .json to not have this field at all, instead of entering n/a

Make new generic error for missing tsv columns. Currently throwing error 131 which is unrelated (just happened to be the error code for the section of code I copied)

  • Does this refer to warning code 82 - "CUSTOM_COLUMN_WITHOUT_DESCRIPTION"? If possible would be great to completely skip this for motion.tsv files.

I think it would make sense to add the tracksys entity and label to session and top level file regex, if there are no objections I'll do that.

  • You are right, please go ahead. :)

I will then modify the channel types in dual task data set, and change the spec so that motion.tsv is allowed to have headers.

sjeung avatar Aug 13 '22 17:08 sjeung

Hi @rwblair, both examples should now check out without irrelevant errors or warnings in the validator output. Relevant error & warning

  • the channel type validation for channels.tsv is not working (maybe taking types from EEG/MEG), resulting in an error
  • the warning related to undefined headers

sjeung avatar Aug 25 '22 21:08 sjeung

Hello @rwblair, I have one question about keyword check of channels.tsv entries. I do see that the keywords for column type is defined, but is there a check for restricted keywords of other columns?

For instance we have restricted keywords (x,y,z,quat_x,quat_y ...) for column "component" of channels.tsv and so does BIDS-NIRS but I could not figure out where in the validator this is defined. If anything, this would be the only change to be made to catch up with the up-to-date version of the spec. Thanks!

sjeung avatar Feb 03 '23 08:02 sjeung

@sjeung I'm not seeing any checks for the orientation_component in the node validator. We didn't have something like json schema to make this easy so any checks on values outside of the headers of tsv files had to be coded up manually.

I see those enumerations of values in the bids-specification schema so the new schema based validator will at least be able to take care of them.

I pushed a couple of commits to implement this. If they look good and work as you'd expect we should be good. I'll keep on poking at the tests to get them to pass.

rwblair avatar Feb 14 '23 15:02 rwblair

@rwblair @sappelhoff @JuliusWelzel
So validator seems to work fine when these three data sets passing without any error other than empty files one, and warning for undefined columns (see #1627).

Is there anything I can do regarding the checks that are still failing? (I see three of them)

  • With the schema and the example data set change above, I'd say again that we are ready to go. Do you review the schema as a PR as well?

sjeung avatar Feb 17 '23 17:02 sjeung

@sjeung I think you can ignore the Deno tests.

The other two failing tests are due to the code coverage, meaning you have not written enough tests for the new lines of code you introduced with this PR.

One easy test to write is for the channels file. Have a look at how NIRS does it here, ... I think you can pretty much adapt this for motion and write a few lines right below:

https://github.com/bids-standard/bids-validator/blob/e9e69512f7465ec62d850ad27e7fd2fe627ef3fc/bids-validator/tests/tsv.spec.js#L609-L631

Please have a look at the developer documentation so that you can properly test your code locally:

  1. https://github.com/bids-standard/bids-validator/blob/master/CONTRIBUTING.md#using-the-development-version-of-bids-validator
  2. https://github.com/bids-standard/bids-validator#testing

other than that, I agree that this is pretty much done. We basically need to wait a tiny bit longer until the discussion in:

  • https://github.com/bids-standard/bids-specification/pull/981

is done ... because there might be adjustments that need to be transferred to the validator and examples. And then we're ready to start the community review.

sappelhoff avatar Feb 20 '23 08:02 sappelhoff

@rwblair thanks for the eslint fix, I was quite puzzled by this ...!

sjeung avatar Feb 21 '23 16:02 sjeung

No problem, should really be added as an option to the package.json. Full command I ran was: eslint --fix \"./bids-validator/{tests,utils,validators}/**/*.js\"

No need to worry about deno tests, they're failing on master right now as well.

rwblair avatar Feb 21 '23 16:02 rwblair

Tested with bids-standard/bids-examples#348, everything looks good to me. We have a maintainers call later today so I'll wait until after that to pull the trigger on the merge.

rwblair avatar Mar 09 '23 15:03 rwblair