bids-validator icon indicating copy to clipboard operation
bids-validator copied to clipboard

Ignore files in _scans.tsv that correspond to entries in .bidsignore (#1366)

Open appukuttan-shailesh opened this issue 1 year ago • 12 comments

PR for handling #1366

Makes use of session storage to save the contents of .bidsignore when it is read originally. This is retrieved later while evaluating _scans.tsv files for ignoring the matching files. All tests with npm run test are passing (except two which were failing even on the unchanged master branch), and two new tests have been added.

For testing purposes, a mock session storage was required to be implemented as Jest is currently configured with "testEnvironment": "node". Could possibly be avoided if set to "testEnvironment": "jsdom". Another option is to employ jest-localstorage-mock package. But the current mock implementation seemed the simplest, and so went with it.

Two tests have been added:

  • should ignore files in scans.tsv that correspond to entries in .bidsignore This checks that files to be ignored, based on .bidsignore, are being ignored when evaluating the file list in _scans.tsv

  • should not allow missing files listed in scans.tsv and not accounted for by .bidsignore This catches files that are listed in _scans.tsv but are actually missing, and not accounted for by .bidsignore

First PR here; apologies if I have missed anything. @effigies: Could you take a look?

appukuttan-shailesh avatar Mar 09 '24 16:03 appukuttan-shailesh

In bidscoin, I have added extra lines of code to remove those bidsignore files from the scans.tsv file because the bids-validator was complaining about them. Since the specs don't mention anything about this (at least not that I'm aware of), I simply took the validator output as the definition of the standard. So does this mean that I should not have added those extra lines of code to bidscoin and should just ignore the validator here?

marcelzwiers avatar Mar 19 '24 15:03 marcelzwiers

@marcelzwiers I would call this undefined behavior. The spec didn't say what to do, and bidsignore is not a spec-level concept, so the interaction turned out to be a problem.

As a practical matter, I think following the validator when it exceeds the specification is a good idea unless your goal is to force the issue and aim your users to complain to the validator issue tracker. I wouldn't consider that a bug, but you could stop removing the lines, once this or a similar solution is released.

effigies avatar Mar 19 '24 16:03 effigies

Codecov Report

Attention: Patch coverage is 91.30435% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 85.52%. Comparing base (d626096) to head (6dacdea). Report is 12 commits behind head on master.

Files Patch % Lines
bids-validator/utils/getSessionStorage.js 87.50% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1914      +/-   ##
==========================================
+ Coverage   83.57%   85.52%   +1.94%     
==========================================
  Files          92      132      +40     
  Lines        3890     6285    +2395     
  Branches     1271     1549     +278     
==========================================
+ Hits         3251     5375    +2124     
- Misses        541      806     +265     
- Partials       98      104       +6     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 21 '24 15:03 codecov[bot]

 	1: [ERR] Internal error. SOME VALIDATION STEPS MAY NOT HAVE OCCURRED (code: 0 - INTERNAL ERROR)
		Evidence: ReferenceError: sessionStorage is not defined
    at TSV (/home/runner/work/bids-validator/bids-validator/bids-validator/validators/tsv/tsv.js:617:23)
    at /home/runner/work/bids-validator/bids-validator/bids-validator/validators/tsv/validate.js:33:9

I'm not sure if global storage is the right tool for this job. @rwblair @nellh Your insight would be appreciated here. I don't think I'm qualified to review this PR.

effigies avatar Mar 21 '24 15:03 effigies

I did have my doubts about using sessionStorage for solving this problem, and it was infact a fallback approach to get the web app working. I believe the JS web app and the Python package might be using partly the same codebase (?), and if so this will not suffice. My first approach was to re-open the .bidsignore file when analyzing the _scans.tsv file, but this was problematic as .bidsignore is no longer accessible at that stage of the code (i.e. inside tsv.js), and couldn't force read owing to browser security related issues.

Looking forward to some feedback (@rwblair @nellh ) and suggestions on what would be a useful way to tackle this. Happy to try out a different approach.

appukuttan-shailesh avatar Mar 21 '24 16:03 appukuttan-shailesh

Please ignore the Python package. It is only useful for packaging regular expressions for validating filenames. It will be replaced entirely in the future.

effigies avatar Mar 21 '24 16:03 effigies

Just trying to debug the remaining errors with the current approach while waiting for further feedback.

Locally, I am testing the following (based on instructions here) before pushing the changes:

  • npm run test
  • npm run lint

But the errors that have been reported above (through the workflows) are never caught/tested via the above. Are there additional instructions for running the entire testing pipeline locally?

I have manually ran the following to ensure previous workflow errors do not arise again:

  • bids-validator/bin/bids-validator bids-validator/tests/data/valid_headers/ --ignoreNiftiHeaders
  • ../../../bin/bids-validator 7t_trt -c /home/shailesh/gits/bids-validator/bids-validator/tests/data/bids-examples/bidsconfig.json --ignoreNiftiHeaders

5 workflows await approval from maintainers to check if all is clear this time around.

appukuttan-shailesh avatar Mar 22 '24 12:03 appukuttan-shailesh

In addition to the discussions above about the nature of the changes, I think the PR is passing most of the checks. 3 of them that it is failing, is found to have failed with similar status in a recent merged PR as well, so not sure if it's something specific to this PR that I should look into. I have made a small change to address the 4th failed test which dealt with code coverage; the current change should marginally improve that.

appukuttan-shailesh avatar Mar 25 '24 10:03 appukuttan-shailesh

Could we run the remaining 5 workflows to see if they pass?

5 workflows awaiting approval

Also, happy to receive any feedback regarding the changes here.

appukuttan-shailesh avatar Apr 08 '24 07:04 appukuttan-shailesh

As mentioned above, I believe it is now passing all the checks that a previous merged PR conformed to. So not sure if the 3 that failed, and 1 skipped, are something that I should look into.

appukuttan-shailesh avatar Apr 08 '24 08:04 appukuttan-shailesh

I was wondering if we had any feedback on this PR.

appukuttan-shailesh avatar May 07 '24 13:05 appukuttan-shailesh

I still think we want the following cases:

File exists File is valid File is ignored Result
T T * Pass
T F T Pass
T F F Fail
F * * Fail

It seems like a regression to me to stop erroring when a user mentions a file that does not exist. The goal is to

I spent a little time trying to figure out how to pass ignored files around. I got this far:

❯ git --no-pager diff
diff --git a/bids-validator/validators/bids/fullTest.js b/bids-validator/validators/bids/fullTest.js
index edf0b405..efcd6204 100644
--- a/bids-validator/validators/bids/fullTest.js
+++ b/bids-validator/validators/bids/fullTest.js
@@ -76,8 +76,10 @@ const fullTest = (fileList, options, annexed, dir, schema, callback) => {
   }
 
   // remove ignored files from list:
+  const ignoredFiles = {}
   Object.keys(fileList).forEach(function (key) {
     if (fileList[key].ignore) {
+      ignoredFiles[key] = fileList[key]
       delete fileList[key]
     }
   })
@@ -123,6 +125,7 @@ const fullTest = (fileList, options, annexed, dir, schema, callback) => {
         participants,
         phenotypeParticipants,
         stimuli,
+        ignoredFiles,
       )
     })
     .then(({ tsvIssues, participantsTsvContent }) => {
diff --git a/bids-validator/validators/tsv/tsv.js b/bids-validator/validators/tsv/tsv.js
index 0b76e6cd..153b62b7 100644
--- a/bids-validator/validators/tsv/tsv.js
+++ b/bids-validator/validators/tsv/tsv.js
@@ -36,7 +36,7 @@ const filenameEvidence = (filename) => `Filename: ${filename}`
  * specification.
  */
 
-const TSV = (file, contents, fileList, callback) => {
+const TSV = (file, contents, fileList, ignoredFiles, callback) => {
   const issues = []
   const stimPaths = []
   if (contents.includes('\r') && !contents.includes('\n')) {
@@ -628,7 +628,7 @@ const TSV = (file, contents, fileList, callback) => {
         const scanFullPath = scanDirPath + '/' + scanRelativePath
 
         // check if file should be ignored based on .bidsignore content
-        if (ig && ig.ignores(path.relative('/', scanRelativePath))) {
+        if (ignoredFiles.includes(scanFullPath)) {
           continue
         }
 
diff --git a/bids-validator/validators/tsv/validate.js b/bids-validator/validators/tsv/validate.js
index dba18e7a..dfcd725c 100644
--- a/bids-validator/validators/tsv/validate.js
+++ b/bids-validator/validators/tsv/validate.js
@@ -9,6 +9,7 @@ const validate = (
   participants,
   phenotypeParticipants,
   stimuli,
+  ignoredFiles,
   annexed,
   dir,
 ) => {
@@ -34,6 +35,7 @@ const validate = (
           file,
           contents,
           fileList,
+          ignoredFiles,
           function (tsvIssues, participantList, stimFiles) {
             if (participantList) {
               if (file.name.endsWith('participants.tsv')) {

Something's not working right, but I don't understand the validator enough to know what. Perhaps this will help you, though?

effigies avatar May 07 '24 14:05 effigies