data-multi-subject CI workflow file (`validator.yml`) is in need of maintenance (dependency versions, disk space, speed)

The validator.yml GitHub Actions CI workflow file is currently failing on all PRs, blocking merges even for approved PRs.

@mguaypaq helped to summarize many of the various issues:

12:37 PM It's been a while, but from what I remember:

[x] The code is unnecessarily slow to run (because it re-runs many calls to pybids, in particular)

[x] It's still pinned to the very much EOL python 3.7

[ ] Some of its dependencies are not backwards compatible

[ ] It relies on an old file naming scheme that we've since changed

[x] It may be running out of disk space on the github actions runner? But there's a code comment somewhere in the workflow files about how to get more space?

[ ] Some of the checks that it does seem... wrong? Like, a missing abs() around a comparison of two floating point numbers for approximate equality, for example

[x] It also looks like the return code of the checker is just ignored by the workflow file? But it doesn't even install anymore, and that results in a workflow failure

[x] No automatic retries on failed gets (with -J8)

I'm very interested in fixing the workflow to help unblock current and future PRs! (I plan to start with the more "maintenance"-y tasks, then move on to the "correctness" tasks related to the validation itself.)

Jun 25 '24 17:06 joshuacwnewton

Another small point: this line is very flaky, and tends to make the workflow fail: https://github.com/spine-generic/data-multi-subject/blob/5398c0faf62b02e0475f76498a8af24c0c3722c2/.github/workflows/validator.yml#L98

Usually it's just a transient network error while downloading a few of the files. So, we could make it more robust with a simple:

# try a second time if a few downloads failed the first time
git annex get -J8 || git annex get

Jun 25 '24 17:06 mguaypaq

A hopeful start: The disk space issues have a very quick, very neat solution. :tada:

At the start of the workflow, things look like this:

  66G   74G /mnt           /dev/sdb1
  21G   73G /              /dev/root
  99M  105M /boot/efi      /dev/sda15

Our PWD is associated with the "21G free" disk. But, note /mnt, which has 66GB free (!!!). (Notably, it looks like this tempdisk used to be 14GB.)

We can take advantage of all of this extra space using the Maximize build disk space action. After running this action, df looks like this:

  87G   87G /home/runner/work/data-multi-subject/data-multi-subject /dev/mapper/buildvg-buildlv
 512M   73G /                                                       /dev/root
 100M   74G /mnt                                                    /dev/sdb1
  99M  105M /boot/efi                                               /dev/sda15

Thanks to the Logical Volume Manager (LVM), We now have access to the entirety of the 87G, with a step that takes 2 seconds, as opposed to the 3+ minutes it takes to remove unwanted software.

I was a little concerned about whether this would result in slower RW times (thanks to LVM), but the git annex step seems to take ~10m either way. :)

Jun 25 '24 20:06 joshuacwnewton

Hey there! Do you really need to download the annexed data to run the BIDS spec and protocol compliance checks ? If I am not mistaken, this only read data from the json sidecars, you never open the nifti files, right ? You can set the bids-validator to ignore the annex broken symlink with --config.ignore=43 (still validates the naming of the nifti). At least, that the way we run our tests for cneuromod.

Oct 18 '24 14:10 bpinsard

that is a great suggestion, thank you @bpinsard !

Oct 18 '24 15:10 jcohenadad

By default, bids-validator does open the nifti files to do some verification of the headers. I'm not sure how extensive the checks are, but I do know that it checks for broken/missing s-form and q-form codes, which is useful (and we fixed some internal datasets thanks to these warnings).

But if we really can't fit the datasets on disk, that's a good suggestion, thanks. (I guess we could add NIFTI_TOO_SMALL to .bids-validator-config.json for the same effect, but that would disable the check for everyone. Using the command-line option like you suggest is more appropriate, so that we only disable the check in this particular CI environment.)

Oct 18 '24 15:10 mguaypaq

I see, in your case where you don't necessarily control the software for the conversion to nifti, you likely want these checks to be active.

FYI: I got here again because I noticed you developed a custom protocol compliance tests, and I have been using your dataset to test a still WIP generic tool https://github.com/UNFmontreal/forbids for that matter.

Oct 19 '24 14:10 bpinsard

data-multi-subject data-multi-subject copied to clipboard

CI workflow file (`validator.yml`) is in need of maintenance (dependency versions, disk space, speed)

data-multi-subject
data-multi-subject copied to clipboard