bids-validator
bids-validator copied to clipboard
Error 44 on good files
Following up on the thread https://github.com/bids-standard/bids-validator/issues/671 , version 1.1.1 continues to have this issue. Certain files that look good and is viewable in a viewer like ITK-SNAP (see attached) are flagged as
"We were unable to read this file. Make sure it contains data (fileSize > 0 kB) and is not corrupted, incorrectly named, or incorrectly symlinked."
Could you share the files?
Or at least one example
We can't share the original file, unfortunately.
@sandhitsu can you empty the image (zero voxels) and check it again. If bids-validator
still complains, we can share with @chrisfilo the empty image.
If the empty file resolves the problem, than the problem is the original file. We can share the original file, but only privately with @chrisfilo (not online), and we need to remove the subject ID to some random number. We need to respect the data agreements to the letter, unfortunately.
Zeroing out the image does not solve the problem. Here is the zero image. sub-XXXX_ses-01_run-01_T1w.nii.gz
Could not replicate:
me@christop ~/Downloads $ node --version v11.4.0
me@christop ~/Downloads $ bids-validator --version 1.1.1
me@christop ~/Downloads $ bids-validator test_ds
This dataset appears to be BIDS compatible.
Summary: Available Tasks: Available Modalities: 2
Files, 51.94KB T1w 1 - Subject
1 - Session
If you have any questions please post on https://neurostars.org/tags/bids
--
Perhaps a permission issues or something to do with a network drive?
PS Storing personal identifiable information in participant labels is a bad and dangerous practice. I edited your original post to remove the screenshot which included the participant label.
OOPS I thought I cut the title bar so that the ID won't show, must have been in the info tab then. Sorry about that. I'll download your zip file test. Thanks for looking into this.
The downloaded zip file passes validation. Here is what is happening. When I copy the subject directory somewhere to make a single subject directory tree, it passes, but as part of the larger dataset, it fails.
The software is pretty fast checking thousands of files. I don't know the details of how disk I/O is implemented, but this will sound weird -- it feels like either the software or the disk runs out of gas!
Sandy, you checked the permissions, right? You also checked a dataset with a single subject and it didn't work with that nifti file? What's not clear is how come the empty image does not work if simply copying the original image works. The empty image is just a copied image.
On Sat, Dec 15, 2018, 2:35 PM sandhitsu <[email protected] wrote:
The software is pretty fast checking thousands of files. I don't know the details of how disk I/O is implemented, but this will sound weird -- it feels like either the software or the disk runs out of gas!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-validator/issues/675#issuecomment-447592602, or mute the thread https://github.com/notifications/unsubscribe-auth/AIqafThi_OqHYuHrc84wApFISifwo-Dbks5u5U8YgaJpZM4ZU0FX .
It’s not that the empty image works, and original doesn’t. The same file passes validation if it’s part of a single subject tree and doesn’t if it’s part of the larger dataset. This is true for empty or non-empty image. On Sat, Dec 15, 2018 at 3:15 PM dorianps [email protected] wrote:
Sandy, you checked the permissions, right? You also checked a dataset with a single subject and it didn't work with that nifti file? What's not clear is how come the empty image does not work if simply copying the original image works. The empty image is just a copied image.
On Sat, Dec 15, 2018, 2:35 PM sandhitsu <[email protected] wrote:
The software is pretty fast checking thousands of files. I don't know the details of how disk I/O is implemented, but this will sound weird -- it feels like either the software or the disk runs out of gas!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/bids-standard/bids-validator/issues/675#issuecomment-447592602 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AIqafThi_OqHYuHrc84wApFISifwo-Dbks5u5U8YgaJpZM4ZU0FX
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-validator/issues/675#issuecomment-447594907, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHUGBsOIoC8C-R3cEdgRXuQ4wBxlR_xks5u5VhEgaJpZM4ZU0FX .
Following up on this after further experimentation. Turns out that the file reading issue mentioned above happens with the standalone command line version 1.1.1, but not with the docker version. To summarize, some images are not readable (empty or non-empty doesn't matter) by the command line version ONLY when they are part of a larger dataset tree. In a single subject dataset, there is no error. I have copied the data onto multiple mounted filesystems and that doesn't matter.
Is there a difference in the version of Node.js between the container and bare metal installation?
What do you mean by "they are part of a larger dataset tree"?
Could you share the Docker command you used?
I'll have to get back to you on your first question later.
What I mean is that the same subject directory tree, when validated alone, as in that's the only subject in a dataset tree, it passes validation. However, it throws an error when the same subject directory is one of many. Few weeks ago when you couldn't replicate the error with the sample dataset I provided, this is what happened. I couldn't replicate it either when tested alone as a single-subject dataset, but when I put it back to the larger dataset, the error reappeared.
I am using
docker run -ti --rm -v $PWD:/data:ro bids/validator --verbose /data
Could you try
docker run -ti --rm -v $PWD:/data:ro bids/validator:1.1.1 --verbose /data
?
I did. It appeared to download and then run. Did not throw the error.
The local NodeJS is at v10.15.0
. WWe have upgraded it recently to make sure that is not the problem.
Not sure how to find the node version within the docker image. Nothing of these works:
[dorian@chdimri ~]$ docker run -ti bids/validator:latest node --version
0.0.0
[dorian@chdimri ~]$ docker run -ti bids/validator:1.1.1 node --version
0.0.0
[dorian@chdimri ~]$ docker run -ti bids/validator:1.1.1 node
node does not exist
[dorian@chdimri ~]$ docker run -ti bids/validator:1.1.1 "node --version"
node --version does not exist
[dorian@chdimri ~]$ docker run -ti bids/validator:1.1.1 /bin/bash
/bin/bash does not exist
[dorian@chdimri ~]$ docker run -ti bids/validator:1.1.1 /bin/bash -c "node --version"
/bin/bash does not exist
[dorian@chdimri ~]$ docker run -ti --rm bids/validator:1.1.1 /bin/bash -c "node --version"
/bin/bash does not exist
[dorian@chdimri ~]$ node --version
v10.15.0
The docker file shows that you are using node image v8.11.3.
docker run -ti --rm --entrypoint=node bids/validator:1.1.1 --version
[dorian@chdimri ~]$ docker run -ti --rm --entrypoint=node bids/validator:1.1.1 --version
Unable to find image 'bids/validator:1.1.1' locally
1.1.1: Pulling from bids/validator
a073c86ecf9e: Pull complete
becc6a89816a: Pull complete
fa183c3e7c21: Pull complete
e2dada1dea71: Pull complete
df496f65d26c: Pull complete
Digest: sha256:66c42b3748d6dcf4f64cc85d0821870a5b3d87882b3cdee22ca4ce1b052425bd
Status: Downloaded newer image for bids/validator:1.1.1
v8.11.3
Well, I am not sure what is going on. Things should work with 10.15. Unfortunately, I cannot replicate your issues locally.
Obviously the best would be to give you the data, but that may require weeks of preparation of user agreements and signatures. Do you think an interactive session via webex can help, where we type in the commands you need to investigate the source of the problem?
It would help, but unfortunately, I do not have the resources to provide such level of user support. Best stick with docker for now. Maybe another user runs into this and will be able to share the data. Maybe next refactoring will make this issue go away.
Just an update.
Since docker is working and the local npm install is not, we broadening the permission of NPM installation folders or the data folder. Nothing of this worked.
We then observed that only 1281 nifti files out of 5000+ files show an access error, and of course, all are there and accessible. From the subject ID we could guess that file access was interrupted during the run at some point. To verify whether this is the issue, we split the dataset in two parts and, surprise, everything works fine.
So, this is definitely a file access issue when thousands of files are accessed rapidly. This can probably be attributed to python. I am not sure how to test this, or whether python 3 would be compatible with npm/bids-validator. So far I have tried the system python 2.7.5 and python 2.7.14 which came with Anaconda 2.
@chrisfilo , any idea on this? Have you ever tried to test datasets with many files (i.e., 5000+) ?
bids-validator is written in javascript and does not use python
On Tue, Jan 15, 2019 at 12:53 PM dorianps [email protected] wrote:
Just an update.
Since docker is working and the local npm install is not, we broadening the permission of NPM installation folders or the data folder. Nothing of this worked.
We then observed that only 1281 nifti files out of 5000+ files show an access error, and of course, all are there and accessible. So it looks like file access was interrupted during the run. To verify whether this is the issue, we split the dataset in two parts and, surprise, everything works fine.
So, this is definitely a file access issue when thousands of files are accessed rapidly. This can probably be attributed to python. I am not sure how to test this, or whether python 3 would be compatible with npm/bids-validator or not. So far I have tried the systems python 2.7.5 and python 2.7.14 which came with Anaconda 2.
@chrisfilo https://github.com/chrisfilo , any idea on this? Have you ever tried to test datasets with many files (i.e., 5000+) ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-validator/issues/675#issuecomment-454487186, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp9CL1yVlDQf2M16CdSDRv-DdqrDnks5vDhWGgaJpZM4ZU0FX .
Try https://github.com/bids-standard/bids-validator/issues/677#issuecomment-454534054
On Tue, Jan 15, 2019, 5:32 PM dorianps <[email protected] wrote:
Tried a few more things to find the problem, but haven't found yet the source of the problem.
- Ran gunzip on the whole dataset to convert all .nii.gz files to .nii, thus avoiding the need to decompress with zlib on the fly. No solution, bids-validator produces Error 44: file access error.
- Split the new dataset composed of .nii files into two datasets. This time bids-validator successfully passes each individual dataset, same as with .nii.gz files.
- Uninstalled bids-validator and nodejs v10.15 to install the older nodejs v8.11.4 and bids-validator on top of it. No solution, the full dataset gives an error.
- Uninstalled nodejs v8.11.4 and installed latest development of nodejs v11.6, then installed bids-validator. No solution, the full dataset gives an error. Even tried running things as root, to make sure there is no permission error.
This is just a sample of the last validation attempts on the full dataset and the split datasets:
[root@chdimri dorian]# bids-validator --ignoreWarnings /DATA/dorian/converted_2019/ 1: [ERR] Not a valid JSON file. (code: 27 - JSON_INVALID) ./sub-R733967895/ses-05/anat/sub-R733967895_ses-05_run-01_T2w.json
2: [ERR] We were unable to read this file. Make sure it contains data (fileSize > 0 kB) and is not corrupted, incorrectly named, or incorrectly symlinked. (code: 44 - FILE_READ) ./sub-R733967895/ses-05/anat/sub-R733967895_ses-05_run-01_T2w.nii ./sub-R735322929/ses-01/anat/sub-R735322929_ses-01_run-01_T1w.nii ./sub-R735322929/ses-01/anat/sub-R735322929_ses-01_run-01_T2w.nii ./sub-R735322929/ses-01/anat/sub-R735322929_ses-01_run-02_T1w.nii ./sub-R735322929/ses-02/anat/sub-R735322929_ses-02_run-01_T1w.nii ./sub-R735322929/ses-02/anat/sub-R735322929_ses-02_run-01_T2w.nii ./sub-R735322929/ses-02/anat/sub-R735322929_ses-02_run-02_T1w.nii ./sub-R735322929/ses-03/anat/sub-R735322929_ses-03_run-01_T2w.nii ./sub-R735322929/ses-03/anat/sub-R735322929_ses-03_run-02_T1w.nii ./sub-R735322929/ses-05/anat/sub-R735322929_ses-05_run-01_T1w.nii ... and 1282 more files having this issue (Use --verbose to see them all). Summary: Available Tasks: Available Modalities: 10740 Files, 113.03GB T1w 445 - Subjects T2w 7 - Sessions
If you have any questions please post on https://neurostars.org/tags/bids
[root@chdimri dorian]# bids-validator --ignoreWarnings /DATA/dorian/converted_2019Pack1/ This dataset appears to be BIDS compatible. Summary: Available Tasks: Available Modalities: 5199 Files, 55.67GB T1w 213 - Subjects T2w 7 - Sessions
If you have any questions please post on https://neurostars.org/tags/bids
[root@chdimri dorian]# bids-validator --ignoreWarnings /DATA/dorian/converted_2019Pack2/ This dataset appears to be BIDS compatible. Summary: Available Tasks: Available Modalities: 5537 Files, 57.35GB T1w 232 - Subjects T2w 7 - Sessions
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-validator/issues/675#issuecomment-454577544, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp3fVxesYU6GqDDm8pkSsa0DgF7CMks5vDlbmgaJpZM4ZU0FX .
For what is worth, I performed a verification that nifti headers can be read for all nifti files in the BIDS folder. Tried both PrintHeader
from ANTs and c3d
from ITKsnap.
[dorian@chdimri dorian]$ time find /DATA/dorian/converted_2019/ -iname '*.nii' -exec sh -c 'PrintHeader {} | grep Bounding' \; > /DATA/dorian/PrintHeader_test.log
real 10m5.339s
user 4m40.305s
sys 6m4.294s
[dorian@chdimri dorian]$ time find /DATA/dorian/converted_2019/ -iname '*.nii' -exec sh -c 'c3d {} -info' \; > /DATA/dorian/C3D_test.log
real 14m9.197s
user 6m42.885s
sys 7m24.460s
[dorian@chdimri dorian]$ cat PrintHeader_test.log | grep Bounding | wc -l
5367
[dorian@chdimri dorian]$ cat C3D_test.log | grep Image | wc -l
5367
The total number of files minus the files validator could not read correctly roughly equals 4096 which looks like some artificial limit or buffer.
On Tue, Jan 15, 2019, 11:16 PM dorianps <[email protected] wrote:
For what is worth, I performed a verification that nifti headers can be read for all nifti files in the BIDS folder. Tried both PrintHeader from ANTs and c3d from ITKsnap.
[dorian@chdimri dorian]$ time find /DATA/dorian/converted_2019/ -iname '*.nii' -exec sh -c 'PrintHeader {} | grep Bounding' ; > /DATA/dorian/PrintHeader_test.log
real 10m5.339s user 4m40.305s sys 6m4.294s
[dorian@chdimri dorian]$ time find /DATA/dorian/converted_2019/ -iname '*.nii' -exec sh -c 'c3d {} -info' ; > /DATA/dorian/C3D_test.log
real 14m9.197s user 6m42.885s sys 7m24.460s
[dorian@chdimri dorian]$ cat PrintHeader_test.log | grep Bounding | wc -l 5367 [dorian@chdimri dorian]$ cat C3D_test.log | grep Image | wc -l 5367
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-validator/issues/675#issuecomment-454646122, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp-TJxusuxOIq0tIm8Tl1BmhYblwvks5vDqehgaJpZM4ZU0FX .
Any ideas @DaNish808?
Yes, it is indeed hitting the user open file limit. And I managed to resolve the issue. Once I increased my user file limit from the default 4096 to 10000, the validator works fine.
[dorian@chdimri ~]$ ulimit -Hn
10000
[dorian@chdimri ~]$ bids-validator /DATA/dorian/converted_2019/ --ignoreWarnings
This dataset appears to be BIDS compatible.
Summary: Available Tasks: Available Modalities:
10740 Files, 113.03GB T1w
445 - Subjects T2w
7 - Sessions
If you have any questions please post on https://neurostars.org/tags/bids
The question is why are the file IO's kept open by NodeJS or bids-validator. There is another thread somewhat similar regarding NodeJS: https://github.com/nodejs/node/issues/4386
P.s. Note that to increase the user file limit one needs admin privileges, so it's not feasible for regular users in institutional clusters. I also checked the user file limit at a computing cluster in a major institution, and it's simply 4096.
ulimit -Hn
if you want to check it yourself.