bids-validator icon indicating copy to clipboard operation
bids-validator copied to clipboard

Error 44 on good files

Open ins0mniac2 opened this issue 6 years ago • 35 comments

Following up on the thread https://github.com/bids-standard/bids-validator/issues/671 , version 1.1.1 continues to have this issue. Certain files that look good and is viewable in a viewer like ITK-SNAP (see attached) are flagged as

"We were unable to read this file. Make sure it contains data (fileSize > 0 kB) and is not corrupted, incorrectly named, or incorrectly symlinked."

ins0mniac2 avatar Dec 15 '18 17:12 ins0mniac2

Could you share the files?

chrisgorgo avatar Dec 15 '18 17:12 chrisgorgo

Or at least one example

chrisgorgo avatar Dec 15 '18 17:12 chrisgorgo

We can't share the original file, unfortunately.

@sandhitsu can you empty the image (zero voxels) and check it again. If bids-validator still complains, we can share with @chrisfilo the empty image.

dorianps avatar Dec 15 '18 17:12 dorianps

If the empty file resolves the problem, than the problem is the original file. We can share the original file, but only privately with @chrisfilo (not online), and we need to remove the subject ID to some random number. We need to respect the data agreements to the letter, unfortunately.

dorianps avatar Dec 15 '18 17:12 dorianps

Zeroing out the image does not solve the problem. Here is the zero image. sub-XXXX_ses-01_run-01_T1w.nii.gz

ins0mniac2 avatar Dec 15 '18 18:12 ins0mniac2

Could not replicate:

me@christop ~/Downloads $ node --version                                                                                             v11.4.0                                                                                                                              
me@christop ~/Downloads $ bids-validator --version                                                                                   1.1.1                                                                                                                               
me@christop ~/Downloads $ bids-validator test_ds                                                                                   
This dataset appears to be BIDS compatible.                                                                                                  
Summary:                Available Tasks:        Available Modalities:                                                                2 
Files, 51.94KB                                T1w                                                                                  1 - Subject                                                                                                                          
1 - Session                                                                                                                                                                                                                                                       
If you have any questions please post on https://neurostars.org/tags/bids
--

test_ds.zip

Perhaps a permission issues or something to do with a network drive?

PS Storing personal identifiable information in participant labels is a bad and dangerous practice. I edited your original post to remove the screenshot which included the participant label.

chrisgorgo avatar Dec 15 '18 18:12 chrisgorgo

OOPS I thought I cut the title bar so that the ID won't show, must have been in the info tab then. Sorry about that. I'll download your zip file test. Thanks for looking into this.

ins0mniac2 avatar Dec 15 '18 18:12 ins0mniac2

The downloaded zip file passes validation. Here is what is happening. When I copy the subject directory somewhere to make a single subject directory tree, it passes, but as part of the larger dataset, it fails.

ins0mniac2 avatar Dec 15 '18 18:12 ins0mniac2

The software is pretty fast checking thousands of files. I don't know the details of how disk I/O is implemented, but this will sound weird -- it feels like either the software or the disk runs out of gas!

ins0mniac2 avatar Dec 15 '18 19:12 ins0mniac2

Sandy, you checked the permissions, right? You also checked a dataset with a single subject and it didn't work with that nifti file? What's not clear is how come the empty image does not work if simply copying the original image works. The empty image is just a copied image.

On Sat, Dec 15, 2018, 2:35 PM sandhitsu <[email protected] wrote:

The software is pretty fast checking thousands of files. I don't know the details of how disk I/O is implemented, but this will sound weird -- it feels like either the software or the disk runs out of gas!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-validator/issues/675#issuecomment-447592602, or mute the thread https://github.com/notifications/unsubscribe-auth/AIqafThi_OqHYuHrc84wApFISifwo-Dbks5u5U8YgaJpZM4ZU0FX .

dorianps avatar Dec 15 '18 20:12 dorianps

It’s not that the empty image works, and original doesn’t. The same file passes validation if it’s part of a single subject tree and doesn’t if it’s part of the larger dataset. This is true for empty or non-empty image. On Sat, Dec 15, 2018 at 3:15 PM dorianps [email protected] wrote:

Sandy, you checked the permissions, right? You also checked a dataset with a single subject and it didn't work with that nifti file? What's not clear is how come the empty image does not work if simply copying the original image works. The empty image is just a copied image.

On Sat, Dec 15, 2018, 2:35 PM sandhitsu <[email protected] wrote:

The software is pretty fast checking thousands of files. I don't know the details of how disk I/O is implemented, but this will sound weird -- it feels like either the software or the disk runs out of gas!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/bids-standard/bids-validator/issues/675#issuecomment-447592602 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AIqafThi_OqHYuHrc84wApFISifwo-Dbks5u5U8YgaJpZM4ZU0FX

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-validator/issues/675#issuecomment-447594907, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHUGBsOIoC8C-R3cEdgRXuQ4wBxlR_xks5u5VhEgaJpZM4ZU0FX .

ins0mniac2 avatar Dec 15 '18 20:12 ins0mniac2

Following up on this after further experimentation. Turns out that the file reading issue mentioned above happens with the standalone command line version 1.1.1, but not with the docker version. To summarize, some images are not readable (empty or non-empty doesn't matter) by the command line version ONLY when they are part of a larger dataset tree. In a single subject dataset, there is no error. I have copied the data onto multiple mounted filesystems and that doesn't matter.

ins0mniac2 avatar Jan 14 '19 05:01 ins0mniac2

Is there a difference in the version of Node.js between the container and bare metal installation?

What do you mean by "they are part of a larger dataset tree"?

Could you share the Docker command you used?

chrisgorgo avatar Jan 14 '19 05:01 chrisgorgo

I'll have to get back to you on your first question later.

What I mean is that the same subject directory tree, when validated alone, as in that's the only subject in a dataset tree, it passes validation. However, it throws an error when the same subject directory is one of many. Few weeks ago when you couldn't replicate the error with the sample dataset I provided, this is what happened. I couldn't replicate it either when tested alone as a single-subject dataset, but when I put it back to the larger dataset, the error reappeared.

I am using

docker run -ti --rm -v $PWD:/data:ro bids/validator --verbose /data

ins0mniac2 avatar Jan 14 '19 05:01 ins0mniac2

Could you try

docker run -ti --rm -v $PWD:/data:ro bids/validator:1.1.1 --verbose /data

?

chrisgorgo avatar Jan 14 '19 05:01 chrisgorgo

I did. It appeared to download and then run. Did not throw the error.

ins0mniac2 avatar Jan 14 '19 05:01 ins0mniac2

The local NodeJS is at v10.15.0. WWe have upgraded it recently to make sure that is not the problem.

Not sure how to find the node version within the docker image. Nothing of these works:

[dorian@chdimri ~]$ docker run -ti bids/validator:latest node --version
0.0.0
[dorian@chdimri ~]$ docker run -ti bids/validator:1.1.1 node --version
0.0.0
[dorian@chdimri ~]$ docker run -ti bids/validator:1.1.1 node
node does not exist
[dorian@chdimri ~]$ docker run -ti bids/validator:1.1.1 "node --version"
node --version does not exist
[dorian@chdimri ~]$ docker run -ti bids/validator:1.1.1 /bin/bash
/bin/bash does not exist
[dorian@chdimri ~]$ docker run -ti bids/validator:1.1.1 /bin/bash -c "node --version"
/bin/bash does not exist
[dorian@chdimri ~]$ docker run -ti --rm bids/validator:1.1.1 /bin/bash -c "node --version"
/bin/bash does not exist
[dorian@chdimri ~]$ node --version
v10.15.0

The docker file shows that you are using node image v8.11.3.

dorianps avatar Jan 14 '19 15:01 dorianps

docker run -ti --rm --entrypoint=node bids/validator:1.1.1 --version

chrisgorgo avatar Jan 14 '19 15:01 chrisgorgo

[dorian@chdimri ~]$ docker run -ti --rm --entrypoint=node bids/validator:1.1.1 --version
Unable to find image 'bids/validator:1.1.1' locally
1.1.1: Pulling from bids/validator
a073c86ecf9e: Pull complete
becc6a89816a: Pull complete
fa183c3e7c21: Pull complete
e2dada1dea71: Pull complete
df496f65d26c: Pull complete
Digest: sha256:66c42b3748d6dcf4f64cc85d0821870a5b3d87882b3cdee22ca4ce1b052425bd
Status: Downloaded newer image for bids/validator:1.1.1
v8.11.3

dorianps avatar Jan 14 '19 15:01 dorianps

Well, I am not sure what is going on. Things should work with 10.15. Unfortunately, I cannot replicate your issues locally.

chrisgorgo avatar Jan 14 '19 15:01 chrisgorgo

Obviously the best would be to give you the data, but that may require weeks of preparation of user agreements and signatures. Do you think an interactive session via webex can help, where we type in the commands you need to investigate the source of the problem?

dorianps avatar Jan 14 '19 15:01 dorianps

It would help, but unfortunately, I do not have the resources to provide such level of user support. Best stick with docker for now. Maybe another user runs into this and will be able to share the data. Maybe next refactoring will make this issue go away.

chrisgorgo avatar Jan 14 '19 15:01 chrisgorgo

Just an update.

Since docker is working and the local npm install is not, we broadening the permission of NPM installation folders or the data folder. Nothing of this worked.

We then observed that only 1281 nifti files out of 5000+ files show an access error, and of course, all are there and accessible. From the subject ID we could guess that file access was interrupted during the run at some point. To verify whether this is the issue, we split the dataset in two parts and, surprise, everything works fine.

So, this is definitely a file access issue when thousands of files are accessed rapidly. This can probably be attributed to python. I am not sure how to test this, or whether python 3 would be compatible with npm/bids-validator. So far I have tried the system python 2.7.5 and python 2.7.14 which came with Anaconda 2.

@chrisfilo , any idea on this? Have you ever tried to test datasets with many files (i.e., 5000+) ?

dorianps avatar Jan 15 '19 17:01 dorianps

bids-validator is written in javascript and does not use python

On Tue, Jan 15, 2019 at 12:53 PM dorianps [email protected] wrote:

Just an update.

Since docker is working and the local npm install is not, we broadening the permission of NPM installation folders or the data folder. Nothing of this worked.

We then observed that only 1281 nifti files out of 5000+ files show an access error, and of course, all are there and accessible. So it looks like file access was interrupted during the run. To verify whether this is the issue, we split the dataset in two parts and, surprise, everything works fine.

So, this is definitely a file access issue when thousands of files are accessed rapidly. This can probably be attributed to python. I am not sure how to test this, or whether python 3 would be compatible with npm/bids-validator or not. So far I have tried the systems python 2.7.5 and python 2.7.14 which came with Anaconda 2.

@chrisfilo https://github.com/chrisfilo , any idea on this? Have you ever tried to test datasets with many files (i.e., 5000+) ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-validator/issues/675#issuecomment-454487186, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp9CL1yVlDQf2M16CdSDRv-DdqrDnks5vDhWGgaJpZM4ZU0FX .

chrisgorgo avatar Jan 15 '19 18:01 chrisgorgo

Try https://github.com/bids-standard/bids-validator/issues/677#issuecomment-454534054

On Tue, Jan 15, 2019, 5:32 PM dorianps <[email protected] wrote:

Tried a few more things to find the problem, but haven't found yet the source of the problem.

  • Ran gunzip on the whole dataset to convert all .nii.gz files to .nii, thus avoiding the need to decompress with zlib on the fly. No solution, bids-validator produces Error 44: file access error.
  • Split the new dataset composed of .nii files into two datasets. This time bids-validator successfully passes each individual dataset, same as with .nii.gz files.
  • Uninstalled bids-validator and nodejs v10.15 to install the older nodejs v8.11.4 and bids-validator on top of it. No solution, the full dataset gives an error.
  • Uninstalled nodejs v8.11.4 and installed latest development of nodejs v11.6, then installed bids-validator. No solution, the full dataset gives an error. Even tried running things as root, to make sure there is no permission error.

This is just a sample of the last validation attempts on the full dataset and the split datasets:

[root@chdimri dorian]# bids-validator --ignoreWarnings /DATA/dorian/converted_2019/ 1: [ERR] Not a valid JSON file. (code: 27 - JSON_INVALID) ./sub-R733967895/ses-05/anat/sub-R733967895_ses-05_run-01_T2w.json

    2: [ERR] We were unable to read this file. Make sure it contains data (fileSize > 0 kB) and is not corrupted, incorrectly named, or incorrectly symlinked. (code: 44 - FILE_READ)
            ./sub-R733967895/ses-05/anat/sub-R733967895_ses-05_run-01_T2w.nii
            ./sub-R735322929/ses-01/anat/sub-R735322929_ses-01_run-01_T1w.nii
            ./sub-R735322929/ses-01/anat/sub-R735322929_ses-01_run-01_T2w.nii
            ./sub-R735322929/ses-01/anat/sub-R735322929_ses-01_run-02_T1w.nii
            ./sub-R735322929/ses-02/anat/sub-R735322929_ses-02_run-01_T1w.nii
            ./sub-R735322929/ses-02/anat/sub-R735322929_ses-02_run-01_T2w.nii
            ./sub-R735322929/ses-02/anat/sub-R735322929_ses-02_run-02_T1w.nii
            ./sub-R735322929/ses-03/anat/sub-R735322929_ses-03_run-01_T2w.nii
            ./sub-R735322929/ses-03/anat/sub-R735322929_ses-03_run-02_T1w.nii
            ./sub-R735322929/ses-05/anat/sub-R735322929_ses-05_run-01_T1w.nii
            ... and 1282 more files having this issue (Use --verbose to see them all).

    Summary:                     Available Tasks:        Available Modalities:
    10740 Files, 113.03GB                                T1w
    445 - Subjects                                       T2w
    7 - Sessions

If you have any questions please post on https://neurostars.org/tags/bids

[root@chdimri dorian]# bids-validator --ignoreWarnings /DATA/dorian/converted_2019Pack1/ This dataset appears to be BIDS compatible. Summary: Available Tasks: Available Modalities: 5199 Files, 55.67GB T1w 213 - Subjects T2w 7 - Sessions

If you have any questions please post on https://neurostars.org/tags/bids

[root@chdimri dorian]# bids-validator --ignoreWarnings /DATA/dorian/converted_2019Pack2/ This dataset appears to be BIDS compatible. Summary: Available Tasks: Available Modalities: 5537 Files, 57.35GB T1w 232 - Subjects T2w 7 - Sessions

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-validator/issues/675#issuecomment-454577544, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp3fVxesYU6GqDDm8pkSsa0DgF7CMks5vDlbmgaJpZM4ZU0FX .

chrisgorgo avatar Jan 15 '19 23:01 chrisgorgo

For what is worth, I performed a verification that nifti headers can be read for all nifti files in the BIDS folder. Tried both PrintHeader from ANTs and c3d from ITKsnap.

[dorian@chdimri dorian]$ time find /DATA/dorian/converted_2019/ -iname '*.nii' -exec sh -c 'PrintHeader {} | grep Bounding' \; > /DATA/dorian/PrintHeader_test.log

real    10m5.339s
user    4m40.305s
sys     6m4.294s

[dorian@chdimri dorian]$ time find /DATA/dorian/converted_2019/ -iname '*.nii' -exec sh -c 'c3d {} -info' \; > /DATA/dorian/C3D_test.log

real    14m9.197s
user    6m42.885s
sys     7m24.460s

[dorian@chdimri dorian]$ cat PrintHeader_test.log | grep Bounding | wc -l
5367
[dorian@chdimri dorian]$ cat C3D_test.log | grep Image | wc -l
5367

dorianps avatar Jan 16 '19 04:01 dorianps

The total number of files minus the files validator could not read correctly roughly equals 4096 which looks like some artificial limit or buffer.

On Tue, Jan 15, 2019, 11:16 PM dorianps <[email protected] wrote:

For what is worth, I performed a verification that nifti headers can be read for all nifti files in the BIDS folder. Tried both PrintHeader from ANTs and c3d from ITKsnap.

[dorian@chdimri dorian]$ time find /DATA/dorian/converted_2019/ -iname '*.nii' -exec sh -c 'PrintHeader {} | grep Bounding' ; > /DATA/dorian/PrintHeader_test.log

real 10m5.339s user 4m40.305s sys 6m4.294s

[dorian@chdimri dorian]$ time find /DATA/dorian/converted_2019/ -iname '*.nii' -exec sh -c 'c3d {} -info' ; > /DATA/dorian/C3D_test.log

real 14m9.197s user 6m42.885s sys 7m24.460s

[dorian@chdimri dorian]$ cat PrintHeader_test.log | grep Bounding | wc -l 5367 [dorian@chdimri dorian]$ cat C3D_test.log | grep Image | wc -l 5367

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-validator/issues/675#issuecomment-454646122, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp-TJxusuxOIq0tIm8Tl1BmhYblwvks5vDqehgaJpZM4ZU0FX .

chrisgorgo avatar Jan 16 '19 04:01 chrisgorgo

Any ideas @DaNish808?

chrisgorgo avatar Jan 16 '19 04:01 chrisgorgo

Yes, it is indeed hitting the user open file limit. And I managed to resolve the issue. Once I increased my user file limit from the default 4096 to 10000, the validator works fine.

[dorian@chdimri ~]$ ulimit -Hn
10000
[dorian@chdimri ~]$ bids-validator /DATA/dorian/converted_2019/ --ignoreWarnings
This dataset appears to be BIDS compatible.
        Summary:                     Available Tasks:        Available Modalities:
        10740 Files, 113.03GB                                T1w
        445 - Subjects                                       T2w
        7 - Sessions

If you have any questions please post on https://neurostars.org/tags/bids

The question is why are the file IO's kept open by NodeJS or bids-validator. There is another thread somewhat similar regarding NodeJS: https://github.com/nodejs/node/issues/4386

P.s. Note that to increase the user file limit one needs admin privileges, so it's not feasible for regular users in institutional clusters. I also checked the user file limit at a computing cluster in a major institution, and it's simply 4096.

dorianps avatar Jan 16 '19 05:01 dorianps

ulimit -Hn if you want to check it yourself.

dorianps avatar Jan 16 '19 05:01 dorianps