[ci] Upload disk and snapcraft diagnostics
Add disk usage diagnostics to capture disk space information before heavy build steps:
- [x] Analyze the linux.yml workflow structure to identify where to insert diagnostics
- [x] Add disk diagnostics step before the main Build step (line 132)
- [x] Add disk diagnostics step before the "Build and verify the snap" step (line 308)
- [x] Add single artifact upload step at the end to capture all diagnostics
- [x] Verify the YAML syntax is correct
- [x] Run security checks (CodeQL) - no issues found
- [x] Address review feedback
- [x] All tasks completed
Summary
Added comprehensive disk usage diagnostics to .github/workflows/linux.yml to help diagnose the failing job https://github.com/canonical/multipass/actions/runs/19534052585/job/55935120817 which may be caused by disk space exhaustion.
Changes Made
Two diagnostic checkpoints added:
- Before Build step (line 132): Captures disk state before initial snapcraft build
- Before Snap Build step (line 308): Captures disk state before full snap creation
Each checkpoint captures:
-
df -h- disk usage in human-readable format -
df -i- inode usage -
duon top directories (/root, /home, /tmp, /var, /usr, runner temp/workspace) - Top 50 largest entries on / (single filesystem only)
- All files >50M with their sizes (sorted numerically)
- Snapcraft logs if available (full log from /home/runner/.local/state/snapcraft/log/)
- GitHub Actions warning if free space < 2GB (first checkpoint only)
Implementation details:
- Uses
if: always()to ensure diagnostics run even on job failure - Uses
continue-on-error: trueto prevent diagnostic failures from failing the job - Single upload step at the end captures all diagnostic files
- Artifact name:
runner-disk-diagnostics-${{ matrix.build-type }} - Artifacts uploaded via
actions/upload-artifact@v4with wildcard pattern - Removed
-xflag fromset -euo pipefailto reduce noise - Removed empty echo lines between sections
- Changed
ls -lhtols -land sort by size numerically for proper sorting - Removed snapcraft log capture from first checkpoint (logs don't exist yet)
- Removed disk space warning from second checkpoint (not useful after build)
- Changed snapcraft log from tail to full cat in second checkpoint
Security
- CodeQL analysis completed: No security issues found
- All shell commands use proper error handling with
|| trueto prevent failures - File operations safely handle missing files/directories
Original prompt
Problem: The failing CI run (job 55935120817) may be caused by the runner running out of disk space while building snapcraft/Flutter artifacts. GitHub-hosted runners do not publish ephemeral-disk metrics per-run, so we need to record disk usage at runtime to confirm or rule out space exhaustion.
Change requested: Add diagnostic steps to .github/workflows/dynamic-ci.yml that capture disk usage and large files right before the heavy build steps (snapcraft / flutter build) and upload them as an artifact so they can be inspected after the run.
Target file: .github/workflows/dynamic-ci.yml (use ref 9c630ed129a024d6d97ebf1f50d9162c9053e8a5 to reference current workflow) Link: https://github.com/canonical/multipass/blob/9c630ed129a024d6d97ebf1f50d9162c9053e8a5/.github/workflows/dynamic-ci.yml
What to add: Insert the following two steps immediately before the steps that run snapcraft / the heavy build (or at minimum, before the failing build step). Use if: always() so the diagnostics are recorded on both success and failure; use continue-on-error inside the step to avoid failing the job because of diagnostics.
YAML snippet to add:
name: Dump runner disk diagnostics if: always() run: | set -euxo pipefail OUT=runner-disk-diagnostics.txt echo "==== df -h ====" > "$OUT" df -h >> "$OUT" || true echo "" >> "$OUT" echo "==== df -i ====" >> "$OUT" df -i >> "$OUT" || true echo "" >> "$OUT" echo "==== du top dirs ====" >> "$OUT" du -sh /root /home /tmp /var /usr "${RUNNER_TEMP:-/tmp}" "${RUNNER_WORKSPACE:-/github/workspace}" 2>/dev/null >> "$OUT" || true echo "" >> "$OUT" echo "==== top 50 largest entries on / (no other FS) ====" >> "$OUT" du -ahx / 2>/dev/null | sort -rh | head -n 50 >> "$OUT" || true echo "" >> "$OUT" echo "==== find files >50M ====" >> "$OUT" find / -xdev -type f -size +50M -exec ls -lh {} ; 2>/dev/null | sort -k5 -h | tail -n 50 >> "$OUT" || true echo "" >> "$OUT"
warn if available space < 2GB
FREE_KB=$(df --output=avail -k / | tail -n1 | tr -d ' ') if [ -n "$FREE_KB" ] && [ "$FREE_KB" -lt $((210241024)) ]; then echo "##[warning] Less than 2GB available on / ($(($FREE_KB/1024)) MB)" >> "$OUT" fi
capture snapcraft log if present (log path seen in failing job)
if ls /home/runner/.local/state/snapcraft/log/snapcraft-.log 1> /dev/null 2>&1; then echo "" >> "$OUT" echo "==== snapcraft log tail ====" >> "$OUT" tail -n 400 /home/runner/.local/state/snapcraft/log/snapcraft-.log >> "$OUT" || true fi continue-on-error: true
name: Upload runner disk diagnostics if: always() uses: actions/upload-artifact@v4 with: name: runner-disk-diagnostics path: runner-disk-diagnostics.txt
Notes / rationale:
- Running these diagnostics will let you confirm whether the runner ran out of disk/inodes before or during the build that failed copying libflutter_linux_gtk.so.
- Place the steps before the snapcraft build or heavy Flutter build step; if the build fails early, use if: always() and keep them so the upload runs even after job failure.
- The script captures df, inode usage, top directories, largest files, and any snapcraft logs referenced in the failing job logs.
- The artifact will be retained with the run and can be downloaded for inspection.
Deliverable: Create a branch, add the snippet to .github/workflows/dynamic-ci.yml, and open a pull request titled: "ci: add disk usage diagnostics to dynamic-ci.yml". The PR should include the exact YAML insertion and a short description linking to the failing run: https://github.com/canonical/multipass/actions/runs/19534052585/job/55935120817
If you want I can also:
- Add an automatic warning/early-fail when free space is below a configurable threshold (example uses 2GB), or
- Limit ccache size (ccache -M 1G) and report ccache size concurrently, or
- Place diagnostics both before and after particular steps to see growth during the job.
Please confirm you want me to create the PR with this change in canonical/multipass.
This pull request was created as a result of the following prompt from Copilot chat.
Problem: The failing CI run (job 55935120817) may be caused by the runner running out of disk space while building snapcraft/Flutter artifacts. GitHub-hosted runners do not publish ephemeral-disk metrics per-run, so we need to record disk usage at runtime to confirm or rule out space exhaustion.
Change requested: Add diagnostic steps to .github/workflows/dynamic-ci.yml that capture disk usage and large files right before the heavy build steps (snapcraft / flutter build) and upload them as an artifact so they can be inspected after the run.
Target file: .github/workflows/dynamic-ci.yml (use ref 9c630ed129a024d6d97ebf1f50d9162c9053e8a5 to reference current workflow) Link: https://github.com/canonical/multipass/blob/9c630ed129a024d6d97ebf1f50d9162c9053e8a5/.github/workflows/dynamic-ci.yml
What to add: Insert the following two steps immediately before the steps that run snapcraft / the heavy build (or at minimum, before the failing build step). Use if: always() so the diagnostics are recorded on both success and failure; use continue-on-error inside the step to avoid failing the job because of diagnostics.
YAML snippet to add:
name: Dump runner disk diagnostics if: always() run: | set -euxo pipefail OUT=runner-disk-diagnostics.txt echo "==== df -h ====" > "$OUT" df -h >> "$OUT" || true echo "" >> "$OUT" echo "==== df -i ====" >> "$OUT" df -i >> "$OUT" || true echo "" >> "$OUT" echo "==== du top dirs ====" >> "$OUT" du -sh /root /home /tmp /var /usr "${RUNNER_TEMP:-/tmp}" "${RUNNER_WORKSPACE:-/github/workspace}" 2>/dev/null >> "$OUT" || true echo "" >> "$OUT" echo "==== top 50 largest entries on / (no other FS) ====" >> "$OUT" du -ahx / 2>/dev/null | sort -rh | head -n 50 >> "$OUT" || true echo "" >> "$OUT" echo "==== find files >50M ====" >> "$OUT" find / -xdev -type f -size +50M -exec ls -lh {} ; 2>/dev/null | sort -k5 -h | tail -n 50 >> "$OUT" || true echo "" >> "$OUT"
warn if available space < 2GB
FREE_KB=$(df --output=avail -k / | tail -n1 | tr -d ' ') if [ -n "$FREE_KB" ] && [ "$FREE_KB" -lt $((210241024)) ]; then echo "##[warning] Less than 2GB available on / ($(($FREE_KB/1024)) MB)" >> "$OUT" fi
capture snapcraft log if present (log path seen in failing job)
if ls /home/runner/.local/state/snapcraft/log/snapcraft-.log 1> /dev/null 2>&1; then echo "" >> "$OUT" echo "==== snapcraft log tail ====" >> "$OUT" tail -n 400 /home/runner/.local/state/snapcraft/log/snapcraft-.log >> "$OUT" || true fi continue-on-error: true
name: Upload runner disk diagnostics if: always() uses: actions/upload-artifact@v4 with: name: runner-disk-diagnostics path: runner-disk-diagnostics.txt
Notes / rationale:
- Running these diagnostics will let you confirm whether the runner ran out of disk/inodes before or during the build that failed copying libflutter_linux_gtk.so.
- Place the steps before the snapcraft build or heavy Flutter build step; if the build fails early, use if: always() and keep them so the upload runs even after job failure.
- The script captures df, inode usage, top directories, largest files, and any snapcraft logs referenced in the failing job logs.
- The artifact will be retained with the run and can be downloaded for inspection.
Deliverable: Create a branch, add the snippet to .github/workflows/dynamic-ci.yml, and open a pull request titled: "ci: add disk usage diagnostics to dynamic-ci.yml". The PR should include the exact YAML insertion and a short description linking to the failing run: https://github.com/canonical/multipass/actions/runs/19534052585/job/55935120817
If you want I can also:
- Add an automatic warning/early-fail when free space is below a configurable threshold (example uses 2GB), or
- Limit ccache size (ccache -M 1G) and report ccache size concurrently, or
- Place diagnostics both before and after particular steps to see growth during the job.
Please confirm you want me to create the PR with this change in canonical/multipass.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.
@copilot please address the review above.
@copilot please address the review above.
All review comments addressed in commit 1fc922a.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 89.47%. Comparing base (43d1350) to head (1fc922a).
:warning: Report is 15 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #4520 +/- ##
=======================================
Coverage 89.47% 89.47%
=======================================
Files 243 243
Lines 13877 13877
=======================================
Hits 12416 12416
Misses 1461 1461
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
We don't seem to be nearing space limits today.
Keeping this around for later.