Handle GitHub runners possibly running out of space
sorry this feels very draft status. move it into a another PR before merging this one
Originally posted by @mashehu in https://github.com/nf-core/modules/pull/6286#discussion_r1846901512
# get-number-of-shards:
# runs-on: ubuntu-latest
# outputs:
# # Needs to be a json array
# shards: ${{ steps.shards.outputs.shards }}
# total_shards: ${{ steps.shards.outputs.total_shards }}
# steps:
# - name: Install nf-test
# uses: nf-core/setup-nf-test@v1
# with:
# version: ${{ env.NFT_VER }}
# - id: shards
# run: |
# nftest_output=$(nf-test test --dry-run --changed-since HEAD^ --filter process --follow-dependencies)
# number_of_shards=$(echo $nftest_output | grep -o 'Found [0-9]* related test' | tail -1 | awk '{print $2}')
# three_tests_per_shard=$(echo $(($number_of_shards / 3)) | awk '{print int($1+0.5)}')
# shards_array=$(for shard in $(seq 1 $number_of_shards); do echo $shard; done | tr ' ' '\n' | jq -R . | jq -s .)
# echo "shards=${shards_array}" >> $GITHUB_OUTPUT
# echo "total_shards=${number_of_shards}" >> $GITHUB_OUTPUT
WIP Code
Tested in #6716 with some examples. We'll see how many issues we run into with it.
Wondering if we could use Fusion with S3 locally to avoid this 🤔
To document this idea somewhere...
The GitHub-hosted runners have relatively little available space (18G on ubuntu-latest) on the root partition due to the size of the default runner image, but have a separate relatively-unused partition (66G available space on ubuntu-latest) mounted at /mnt; e.g.:
runner@runnervmg1sw1:~/work/modules/modules$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 72G 55G 18G 76% /
tmpfs 7.9G 84K 7.9G 1% /dev/shm
tmpfs 3.2G 1.1M 3.2G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sdb16 881M 62M 758M 8% /boot
/dev/sdb15 105M 6.2M 99M 6% /boot/efi
/dev/sda1 74G 4.1G 66G 6% /mnt
tmpfs 1.6G 12K 1.6G 1% /run/user/1001
runner@runnervmg1sw1:~/work/modules/modules$ ls -ld /mnt
drwxr-xr-x 3 root root 4096 Nov 17 21:38 /mnt
runner@runnervmg1sw1:~/work/modules/modules$ ls -lh /mnt
total 4.1G
-rw-r--r-- 1 root root 333 Nov 17 21:38 DATALOSS_WARNING_README.txt
drwx------ 2 root root 16K Nov 17 21:38 lost+found
-rw------- 1 root root 4.0G Nov 17 21:38 swapfile
Relocating the Nextflow & nf-test workdirs, the Docker daemon data directory, and conda environments to /mnt can reduce the likelihood that a GitHub-hosted runner runs out of space. E.g., in .github/actions/nf-test-action/action.yml, modifying the following job step:
- name: Run nf-test
...
run: |
sudo mkdir -m 777 -p /mnt/runner
echo '{"data-root": "/mnt/docker"}' | sudo tee /etc/docker/daemon.json
sudo systemctl restart docker
export NFT_WORKDIR=/mnt/runner/nf-test
export NXF_WORK=/mnt/runner/work
export NXF_SINGULARITY_CACHEDIR=/mnt/runner/singularity-cachedir
export CONDA_ENVS_DIRS=/mnt/runner/conda/envs
conda config --prepend pkgs_dirs /mnt/runner/conda/pkgs_dirs
nf-test test \
...
I've used this method to run the nf-test workflow in a fork (substituting runs-on: ubuntu-latest)
interesting, would you mind opening a PR with this changes to the action?
I haven't tested this approach with the AWS-hosted runners that nf-core/modules uses---do they have only one large root filesystem instead? (FWIW, I used mxschmitt/action-tmate to interactively poke around a GitHub-hosted runner)