Use tar --no-same-owner option for untar module
The dev container uses the root user in the container: https://github.com/nf-core/modules/blob/113690eee93197a05f32ca8ff2f175930f6568fb/.devcontainer/devcontainer.json#L12
When run as root, tar -x will preserve ownership (uid-gid) of files in the tarball upon extraction. This can result in an error when using rootless podman if the dev container localWorkspaceFolder, i.e.:
https://github.com/nf-core/modules/blob/113690eee93197a05f32ca8ff2f175930f6568fb/.devcontainer/devcontainer.json#L5-L8
resides on an NFS file system (see Rootless Podman and NFS for more details); e.g.:
/workspaces/modules -> nf-test test modules/nf-core/untar/tests/main.nf.test
...
Command error:
kraken2/opts.k2d
tar: opts.k2d: Cannot change ownership to uid 501, gid 50: Operation not permitted
kraken2/taxo.k2d
tar: taxo.k2d: Cannot change ownership to uid 501, gid 50: Operation not permitted
kraken2/hash.k2d
tar: hash.k2d: Cannot change ownership to uid 501, gid 50: Operation not permitted
tar: Exiting with failure status due to previous errors
The solution proposed by this PR is to add the GNU tar --no-same-owner option to make the extracted files owned by the user that runs the tar command (in the preceding scenario, root in the dev container's user namespace, which is mapped to the user running podman on the host).
PR checklist
- [X] This comment contains a description of changes (with reason).
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] If you've added a new tool - have you followed the module conventions in the contribution docs
- [ ] If necessary, include test data in your PR.
- [ ] Remove all TODO statements.
- [ ] Emit the
versions.ymlfile. - [ ] Follow the naming conventions.
- [ ] Follow the parameters requirements.
- [ ] Follow the input/output options guidelines.
- [ ] Add a resource
label - [ ] Use BioConda and BioContainers if possible to fulfil software requirements.
- Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
- For modules:
- [ ]
nf-core modules test <MODULE> --profile docker - [X]
nf-core modules test <MODULE> --profile singularity - [ ]
nf-core modules test <MODULE> --profile conda
- [ ]
- For subworkflows:
- [ ]
nf-core subworkflows test <SUBWORKFLOW> --profile docker - [ ]
nf-core subworkflows test <SUBWORKFLOW> --profile singularity - [ ]
nf-core subworkflows test <SUBWORKFLOW> --profile conda
- [ ]
- For modules:
Might need to run the CI checks again---seems like there were some network and/or disk-full related errors for the self-hosted runners? e.g.:
https://github.com/nf-core/modules/actions/runs/19082094294/job/55387492031#step:6:1057
ERROR ~ Error executing process > 'TXIMETA_TXIMPORT'
Caused by:
Failed to pull singularity image
command: singularity pull --name depot.galaxyproject.org-singularity-bioconductor-tximeta%3A1.20.1--r43hdfd78af_0.img.pulling.1763111870007 https://depot.galaxyproject.org/singularity/bioconductor-tximeta%3A1.20.1--r43hdfd78af_0 > /dev/null
status : 143
hint : Try and increase singularity.pullTimeout in the config (current is "20m")
message:
INFO: Downloading network image
https://github.com/nf-core/modules/actions/runs/19082094294/job/55387491920#step:6:831
> Command error:
> Unable to find image 'quay.io/biocontainers/bioconductor-tximeta:1.20.1--r43hdfd78af_0' locally
> 1.20.1--r43hdfd78af_0: Pulling from biocontainers/bioconductor-tximeta
> fa7e54f17dc0: Pulling fs layer
> 4ca545ee6d5d: Pulling fs layer
> f76401802415: Pulling fs layer
> 4ca545ee6d5d: Verifying Checksum
> 4ca545ee6d5d: Download complete
> fa7e54f17dc0: Verifying Checksum
> fa7e54f17dc0: Download complete
> fa7e54f17dc0: Pull complete
> 4ca545ee6d5d: Pull complete
> docker: write /var/lib/docker/tmp/GetImageBlob4044426464: no space left on device
https://github.com/nf-core/modules/actions/runs/19082094294/job/55387491895#step:6:803
> Command error:
> Unable to find image 'quay.io/biocontainers/rtg-tools:3.12.1--hdfd78af_0' locally
> 3.12.1--hdfd78af_0: Pulling from biocontainers/rtg-tools
> c1a16a04cedd: Already exists
> 4ca545ee6d5d: Already exists
> 5c8d8c55d21b: Pulling fs layer
> 5c8d8c55d21b: Verifying Checksum
> 5c8d8c55d21b: Download complete
> docker: failed to register layer: write /usr/local/lib/libxcb-render.so.0.0.0: no space left on device
I suspect that this change is triggering far more tests than normally run, resulting in disk space exhaustion on the runners.
To test this hypothesis, in a fork of nf-core/modules I switched to the GitHub-hosted runners, using the secondary /mnt partition for conda environments, docker containers, and nextflow/nf-test work directories (using the technique described in https://github.com/nf-core/modules/issues/7016#issuecomment-3548274321). This resulted in substantially more checks passing:
https://github.com/fasrc/modules/actions/runs/19448287416
Though there were still failures---some of which seem to have plausible explanations, maybe indicating they haven't been run in a while?
e.g., CELLRANGERARC_MKFASTQ failed in some of the docker and singularity shards due to output differeing from the snapshot (https://github.com/fasrc/modules/actions/runs/19448287416/job/55715338873#step:5:449). However, its results are non-deterministic due to multithreading:
https://github.com/nf-core/modules/blob/e753770db613ce014b3c4bc94f6cba443427b726/modules/nf-core/cellrangerarc/mkfastq/main.nf#L5
Also, METAPHLAN3_MERGEMETAPHLANTABLES appears to be subject to bit rot: https://github.com/fasrc/modules/actions/runs/19448287416/job/55715337100#step:5:929
> File "/mnt/runner/nf-test/tests/dd18346802b2188d78a7d615dbbe57ba/work/conda/env-4976d331465f416f-eb1330aa155c9242f2a03346deb44f4d/lib/python3.13/site-packages/metaphlan/metaphlan.py", line 26, in <module>
> from distutils.version import LooseVersion
> ModuleNotFoundError: No module named 'distutils'
distutils was removed in Python 3.12 (https://peps.python.org/pep-0632/), and isn't going to be present in the Python 3.13 installed in the conda environment.
I'm not sure how to proceed with this one; any guidance would be appreciated!
Per nf-core slack, I temporarily increased max_shards from 15 to 30. Will revert after tests have run and before PR is merged.
Still running out of space with 30 shards. Bumping max_shards to 60 to see if that's sufficient...?
Try merging the master branch into your branch, that should lower the number of tests again I think!
Try merging the master branch into your branch, that should lower the number of tests again I think!
@famosab Thanks for the tip! That substantially reduced the number of test failures.
There are still some test failures; none of which seem to be related to the change in the tar command proposed in this PR.
These are the remaining no-space-left-on-device errors. I could try temporarily bumping up max_shards further?
This job seemed to have a network issue ("FATAL: While making image from oci registry: error fetching image to cache: while building SIF from layers: conveyor failed to get: error writing layer: unexpected EOF") while generating a SIF---perhaps a rerun could fix?
These tests fail due to metaphlan 3.0.12 bioconda package referencing a Python library that was removed in Python 3.12 (and is not present in the Python 3.13 currently installed by the environment):
Test Process METAPHLAN3_MERGEMETAPHLANTABLES
...
File "/home/runner/_work/modules/modules/.nf-test/tests/1b6ae314ec0ac11e9fbfa4b41ccbfacc/work/conda/env-a704cd16283e2a0e-eb1330aa155c9242f2a03346deb44f4d/lib/python3.13/site-packages/metaphlan/metaphlan.py", line 26, in <module>
from distutils.version import LooseVersion
ModuleNotFoundError: No module named 'distutils'
This issue was apparently fixed in MetaPhlAn 4.2.0:
https://github.com/biobakery/MetaPhlAn/pull/232#issuecomment-2921660017
Another Metaphlan 3 issue -- possibly related to Python version???
Test Process METAPHLAN3_MERGEMETAPHLANTABLES
...
com.fasterxml.jackson.dataformat.yaml.snakeyaml.error.MarkedYAMLException: while scanning a simple key
in 'reader', line 3, column 1:
line
k^
could not find expected ':'
in 'reader', line 4, column 1:
import
^
at [Source: (InputStreamReader); line: 2, column: 23]
A "different snapshot" error in kofamscan output that I'm able to reproduce on the master branch in a codespace with, e.g. nf-test test --profile singularity modules/nf-core/kofamscan/tests/main.nf.test
Another different-snapshot error, also reproducible in a codespace on the master branch, using nf-test test --profile singularity modules/nf-core/foldmason/msa2lddtreport/tests/main.nf.test
A different snapshot error with harmonization/rgi, reproducible in a codespace on the master branch:
nf-test test --profile conda modules/nf-core/hamronization/rgi/tests/main.nf.test
PR to attempt to fix the metaphlan3_metaphlan3 and metaphlan3_mergemetaphlantables errors: https://github.com/nf-core/modules/pull/9448
PR to attempt to fix the metaphlan3_metaphlan3 and metaphlan3_mergemetaphlantables errors: #9448
This PR was merged, and the metaphlan3 errors should be resolved.
A different snapshot error with harmonization/rgi, reproducible in a codespace on the master branch:
nf-test test --profile conda modules/nf-core/hamronization/rgi/tests/main.nf.test
This error should also be fixed (due to this bioconda PR that was merged: https://github.com/bioconda/bioconda-recipes/pull/60886)
I think the errors in the tcoffee_extractfrompdb tests should be resolved by https://github.com/nf-core/modules/pull/9489
The tcoffee_extractfrompdb tests should now pass, and I further increased the shards from 30 to 60 in an attempt to get the two failing docker tests to pass without "no space left on device" errors. This branch should be ready to test again.