bids-validator icon indicating copy to clipboard operation
bids-validator copied to clipboard

Need exception for "Empty File error" for CTF datasets

Open sappelhoff opened this issue 6 years ago • 30 comments

The validator is now raising an error when empty files are encountered.

For the CTF system (MEG), it is customary to have a bunch of files in a directory ending with .ds. Two of these files are BadChannels and bad.Segments and they are always written - even if no bad channels or bad segments are present. Thus, perfectly fine CTF data can contain empty files.

How should we deal with this? A special exception for empty files for BadChannels and bad.Segments files if they are found within a .ds MEG directory?

sappelhoff avatar Nov 14 '18 20:11 sappelhoff

Sounds reasonable to me.

Best, Chris

On Wed, Nov 14, 2018 at 12:33 PM Stefan Appelhoff [email protected] wrote:

The validator is now raising an error when empty files are encountered.

For the CTF system (MEG), it is customary to have a bunch of files in a directory ending with .ds. Two of these files are BadChannels and bad.Segments and they are always written - even if no bad channels or bad segments are present. Thus, perfectly fine CTF data can contain empty files.

How should we deal with this? A special exception for empty files for BadChannels and bad.Segments files if they are found within a .ds MEG directory?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-validator/issues/651, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp9TGTPOYsH-FejCe0wzEfyPVOiZ0ks5uvH42gaJpZM4Yeh1F .

chrisgorgo avatar Nov 14 '18 21:11 chrisgorgo

actually, how do we deal with all the empty files in https://github.com/bids-standard/bids-examples ?

These will all fail the validator as soon as we update the version in Travis: https://github.com/bids-standard/bids-examples/blob/d9ff5e0c021453002efb40ff4b128e563e88fcd5/.travis.yml#L1-L6

@chrisfilo

sappelhoff avatar Nov 29 '18 12:11 sappelhoff

Perhaps as a fix for the examples in bids-standard/bids-examples, we can tell travis to always validate using:

bids-validator -c {"ignore": [99]}

to ignore the EMPTY_FILE error. However, we'd still need a list for CTF files BadChannels and bad.Segments

sappelhoff avatar Dec 03 '18 17:12 sappelhoff

Sounds reasonable for this particular context.

chrisgorgo avatar Dec 05 '18 04:12 chrisgorgo

The empty file error for the examples has been fixed this way: https://github.com/bids-standard/bids-examples/pull/137

(should have opened a separate issue though ...)


Main issue of here (CTF data) still persists.

sappelhoff avatar Dec 11 '18 08:12 sappelhoff

has there been any progress on this front @sappelhoff @chrisfilo ? I need this to progress on a BIDS reader for MNE

jasmainak avatar Jan 28 '19 20:01 jasmainak

I am unsure, which lines to change. Potential candidates are:

https://github.com/bids-standard/bids-validator/blob/4fd11fbd7c562ea7e390a2c7c73e429d506681a0/utils/files/testFile.js#L43

I suggest adding logic along the lines (pseudocode):

if (stats.size === 0) and NOT (file.name.contains ('BadChannels' or 'bad.Segments')) ... raise empty file error

However, there is also:

https://github.com/bids-standard/bids-validator/blob/e5b3816f0ada8495c46f249b9f203f401ff51ec5/utils/files/validateMisc.js#L3-L6

I find this second code part quite hard to understand ... potentially we have to add something for both code parts.

sappelhoff avatar Jan 31 '19 20:01 sappelhoff

okay I see. I will try to see if I can use the workaround:

bids-validator -c {"ignore": [99]}

for now

jasmainak avatar Jan 31 '19 22:01 jasmainak

That workaround was a proposal that I made, it actually does not work right now ...

the -c argument should be a path to a JSON file.

See: https://github.com/bids-standard/bids-validator/blob/e5b3816f0ada8495c46f249b9f203f401ff51ec5/utils/options.js#L18-L71

It should be easy to add some lines to check if the string can be converted to a valid JavaScript Object ... if yes: take that as a config ... else, try to use it as a file name ... if that doesn't work either: error

sappelhoff avatar Feb 01 '19 08:02 sappelhoff

The more I look at it, I am getting the impression that the code is actually already intended to support JSON strings such as bids-validator -c {"ignore": [99]} instead of only file paths such as bids-validator -c bidsconfig.json

Specifically, I am wondering, whether these lines were intended to capture JSON strings:

https://github.com/bids-standard/bids-validator/blob/e5b3816f0ada8495c46f249b9f203f401ff51ec5/utils/options.js#L42-L44

but it seems to me that these lines are never reached, because all command line inputs qualify as strings, ... or is this only reached if no config is specified, and the input is thus the default {} as defined here:

???

Some guidance would be appreciated @chrisfilo @nellh

would this be a valid approach? Note that I am adding the first if clause

  loadConfig: function(config, callback) {

    if (config.startsWith('{') and config.endsWith('}')) {
        callback(null, { path: 'config' }, JSON.stringify(config))
    }

    else if (typeof config === 'string') {
      // load file
      files
        .readFile({ path: config })
        .then(contents => {
          callback(null, { path: config }, contents)
        })
        .catch(issue => {
          callback([issue], { path: config }, null)
        })
    } else if (typeof config === 'object') {
      callback(null, { path: 'config' }, JSON.stringify(config))
    }
  },

sappelhoff avatar Feb 01 '19 21:02 sappelhoff

@sappelhoff Indeed it should work if you specify it like --config.ignore=99 --config.ignore=44. Note you have to include at least two, since it expects an array for ignore configuration.

nellh avatar Feb 02 '19 00:02 nellh

Thanks @nellh! I will make a PR to make this obvious in the documentation as well.

sappelhoff avatar Feb 02 '19 09:02 sappelhoff

Hello! The .eeg file inside CTF .ds folders can also be empty. Could it also be added as an exception?

Moo-Marc avatar Feb 21 '19 18:02 Moo-Marc

Hi @Moo-Marc - thanks for reporting this.

Regarding "valid empty CTF files", would the list be complete after we include the .eeg file? Or are there some other potentially unknown candidates that we can include in one go?

sappelhoff avatar Feb 22 '19 13:02 sappelhoff

I've also seen some backup files that are created by CTF software that can be empty: e.g. bad.segments.bak, or *.eeg.bak. I believe that's it for potentially empty files.

Note that I asked CTF which files are considered "required" as per their dataset standard, and none of those empty files are. So they could in principle be deleted. But they are still valid. The required files are: .meg4, .res4, .infods, .newds, .hc

Moo-Marc avatar Feb 25 '19 17:02 Moo-Marc

It seems to me that you should use .bidsignore in this case

jasmainak avatar Feb 25 '19 17:02 jasmainak

Do you mean just for the .bak files? Note that they show up quite often, even for data that was not really processed in any way. Just renaming a dataset or adding eeg channel locations, or visualizing the raw data with the CTF viewer can generate these files. I even think they might sometimes be created by the acquisition software. So I consider them fully valid "raw" dataset files.

By the way, I searched the BIDS spec for the word "empty" and found nowhere that empty files were not allowed. May I ask where this is specified?

Moo-Marc avatar Feb 25 '19 18:02 Moo-Marc

By the way, I searched the BIDS spec for the word "empty" and found nowhere that empty files were not allowed. May I ask where this is specified?

As far as I know, this is not explicitly documented in our BIDS spec. I suspect that the EMPTYFILE check was introduced to catch erroneously empty files.

Sounds quite nightmarish with all these files that can be valid and empty in CTF. We could of course simply extend the list here:

https://github.com/bids-standard/bids-validator/blob/b2f0dd3e6a37eb6caa2df0e0b16f535942636536/bids-validator/utils/files/validateMisc.js#L6-L8

We would add eeg and all .bak versions of these files

What are your opinions on this?

sappelhoff avatar Feb 25 '19 19:02 sappelhoff

I'd suggest adding *.eeg and *.bak to the exception as an immediate step. But maybe there needs to be a broader discussion about the check itself.

Moo-Marc avatar Feb 25 '19 19:02 Moo-Marc

I'd suggest adding *.eeg and *.bak to the exception as an immediate step

I think that adding a wildcard * would not be a good idea. We should really be able to pin down the exact file names. Otherwise we could run into problems with other data formats. for example the BrainVision format has a binary datafile that ends with .eeg ... so using *.eeg would also catch these. And they should truly never be empty.

I was thinking about:

const exceptionMiscs = ['BadChannels', 'bad.segments', '.eeg', 'BadChannels.bak', 'bad.segments.bak', '.eeg.bak']

would that not be enough?

sappelhoff avatar Feb 25 '19 19:02 sappelhoff

Sorry, .eeg is only the extension. The actual file name for the CTF .eeg file is for example: sub-0001_ses-0001_task-rest_run-01_meg.ds/sub-0001_ses-0001_task-rest_run-01_meg.eeg It must have the same name as the .ds folder (as discussed in issue 692). So if not a wildcard, then the check would need to be more complex.

I'm curious, was there at some point a decision that the BIDS validator would also validate the raw data formats themselves? It can obviously be useful, but seems to me to go beyond the primary objective and would require a lot more effort considering the plethora of file formats.

Moo-Marc avatar Feb 25 '19 19:02 Moo-Marc

I think it's kind of easy in the MRI world because they agreed upon nifti as the accepted format. It's then useful for the validator to check if the header of nifti agrees with the sidecar json files. There is also a library to read nifti files in javascript.

Unfortunately, in the case of EEG/iEEG/MEG, a single file format did not really happen. The code in the bids validator was originally written for MRI data. But now it's become a challenge because there aren't equivalent javascript readers for these data types. I agree that this is a big maintenance burden.

jasmainak avatar Feb 25 '19 22:02 jasmainak

But wouldn't it be easy to just edit the .bidsignore? I can see why it can be useful to hardcode it in the code, but this could easily become a problem if we had more exceptions in other file formats ...

jasmainak avatar Feb 25 '19 22:02 jasmainak

I'd rather suggest not enforcing this empty file rule by default. I think it's fine from my perspective that this tool would not fully validate all data formats in all modalities. But then we shouldn't have any constraints on those files. For those specific modalities or file formats that you wish to validate further, then the empty file check could be applied on those only, as appropriate for those formats. I don't think users should have to add valid files for any accepted format to .bidsignore since they don't actually conflict with the BIDS standard. Does that seem reasonable?

Moo-Marc avatar Feb 25 '19 22:02 Moo-Marc

I see two separate issues here:

  1. Should the validator ignore .bak files?

  2. Should the validator raise an error for empty files by default with some exceptions or only check a selected set of extensions for content?

1.@Moo-Marc you should probably open a new issue for this. I don't think this extension should be added to the specification since .bak files are temporary and redundant in nature, but there is a precedence in the validator to ignore certain file types (such as hidden dot-files). 2.There are two aspects of this. First the number of file types where empty files are perfectly valid vs the number of file types where empty files are a sign of some sort of an issue. I believe there are far fewer files types where empty file is a meaningful valid file. Second - the engineering cost of switching the existing default to the opposite (as in someone needs to contribute and test this code). Considering those two options I would support adding .eeg to the list of extensions that are allowed to be empty.

chrisgorgo avatar Feb 26 '19 01:02 chrisgorgo

1. Should the validator ignore .bak files

Thinking again, I would say yes. The validator should not treat files inside .ds separately because it's a multifile system whose internals are not detailed in the specification. Anything inside the .ds folder may be considered raw data (?) It seems more reasonable that the non-zero size rule should be only for folder as a whole not to individual files. In MNE too, we point directly to the root of the directory, not to individual files. Taking this view would remove the necessity to hardcode exceptions (and thus potential for bugs). What do you think?

jasmainak avatar Feb 26 '19 04:02 jasmainak

The validator should not treat files inside .ds separately because it's a multifile system whose internals are not detailed in the specification.

Are .bak files only expected to be in .ds folders?

It seems more reasonable that the non-zero size rule should be only for folder as a whole not to individual files.

If you are talking about any folder and any file I disagree. An empty file such as sub-01_T1w.nii.gz is definitely a sign of something going wrong with the dataset preparation. The current does catch such situation. What I do find sensible is to ignore the contents of .ds folders.

chrisgorgo avatar Feb 26 '19 05:02 chrisgorgo

I don't think users should have to add valid files for any accepted format to .bidsignore since they don't actually conflict with the BIDS standard. Does that seem reasonable?

yes, sounds reasonable. I would not suggest to resolve the present issue through .bidsignore alone

What I do find sensible is to ignore the contents of .ds folders.

I agree with this. That would solve our CTF specific problems.

A separate suggestion that I'd find sensible is to make the EMPTYFILE error a warning instead of an error.

One of these two suggestions could be a solution. I am in favor of ignoring the CTF .ds folder until somebody comes along and implements a CTF-reader in Node/JS, which can then take over the checking load :-)

sappelhoff avatar Feb 26 '19 08:02 sappelhoff

Are .bak files only expected to be in .ds folders?

Yes for CTF datasets. I agree with everything in the last post. If someone implements a CTF validator, then it could be used later on. I'll mention here that I have a Matlab CTF to BIDS converter, but I'm unfamiliar with JS.

Moo-Marc avatar Feb 26 '19 16:02 Moo-Marc

If you are talking about any folder and any file I disagree. An empty file such as sub-01_T1w.nii.gz is definitely a sign of something going wrong with the dataset preparation. The current does catch such situation.

Sorry for not being clear. I meant specifically in the case of .ds folders. It's a bit like your .nii.gz file. One does not unzip them and check the size of the files inside

until somebody comes along and implements a CTF-reader in Node/JS

let's write a fif reader and an edf reader first for which code exists :)

jasmainak avatar Feb 27 '19 04:02 jasmainak