data-multi-subject
data-multi-subject copied to clipboard
CI workflow file (`validator.yml`) is in need of maintenance (dependency versions, disk space, speed)
The validator.yml GitHub Actions CI workflow file is currently failing on all PRs, blocking merges even for approved PRs.
@mguaypaq helped to summarize many of the various issues:
12:37 PM It's been a while, but from what I remember:
- [x] The code is unnecessarily slow to run (because it re-runs many calls to pybids, in particular)
- [x] It's still pinned to the very much EOL python 3.7
- [ ] Some of its dependencies are not backwards compatible
- [ ] It relies on an old file naming scheme that we've since changed
- [x] It may be running out of disk space on the github actions runner? But there's a code comment somewhere in the workflow files about how to get more space?
- [ ] Some of the checks that it does seem... wrong? Like, a missing abs() around a comparison of two floating point numbers for approximate equality, for example
- [x] It also looks like the return code of the checker is just ignored by the workflow file? But it doesn't even install anymore, and that results in a workflow failure
- [x] No automatic retries on failed
gets(with-J8)
I'm very interested in fixing the workflow to help unblock current and future PRs! (I plan to start with the more "maintenance"-y tasks, then move on to the "correctness" tasks related to the validation itself.)
Another small point: this line is very flaky, and tends to make the workflow fail: https://github.com/spine-generic/data-multi-subject/blob/5398c0faf62b02e0475f76498a8af24c0c3722c2/.github/workflows/validator.yml#L98
Usually it's just a transient network error while downloading a few of the files. So, we could make it more robust with a simple:
# try a second time if a few downloads failed the first time
git annex get -J8 || git annex get
A hopeful start: The disk space issues have a very quick, very neat solution. :tada:
At the start of the workflow, things look like this:
66G 74G /mnt /dev/sdb1
21G 73G / /dev/root
99M 105M /boot/efi /dev/sda15
Our PWD is associated with the "21G free" disk. But, note /mnt, which has 66GB free (!!!). (Notably, it looks like this tempdisk used to be 14GB.)
We can take advantage of all of this extra space using the Maximize build disk space action. After running this action, df looks like this:
87G 87G /home/runner/work/data-multi-subject/data-multi-subject /dev/mapper/buildvg-buildlv
512M 73G / /dev/root
100M 74G /mnt /dev/sdb1
99M 105M /boot/efi /dev/sda15
Thanks to the Logical Volume Manager (LVM), We now have access to the entirety of the 87G, with a step that takes 2 seconds, as opposed to the 3+ minutes it takes to remove unwanted software.
I was a little concerned about whether this would result in slower RW times (thanks to LVM), but the git annex step seems to take ~10m either way. :)
Hey there!
Do you really need to download the annexed data to run the BIDS spec and protocol compliance checks ?
If I am not mistaken, this only read data from the json sidecars, you never open the nifti files, right ?
You can set the bids-validator to ignore the annex broken symlink with --config.ignore=43 (still validates the naming of the nifti).
At least, that the way we run our tests for cneuromod.
that is a great suggestion, thank you @bpinsard !
By default, bids-validator does open the nifti files to do some verification of the headers. I'm not sure how extensive the checks are, but I do know that it checks for broken/missing s-form and q-form codes, which is useful (and we fixed some internal datasets thanks to these warnings).
But if we really can't fit the datasets on disk, that's a good suggestion, thanks. (I guess we could add NIFTI_TOO_SMALL to .bids-validator-config.json for the same effect, but that would disable the check for everyone. Using the command-line option like you suggest is more appropriate, so that we only disable the check in this particular CI environment.)
I see, in your case where you don't necessarily control the software for the conversion to nifti, you likely want these checks to be active.
FYI: I got here again because I noticed you developed a custom protocol compliance tests, and I have been using your dataset to test a still WIP generic tool https://github.com/UNFmontreal/forbids for that matter.