balena-engine
balena-engine copied to clipboard
Ignore missing config.v2.json files on migration
TODO: One thing we should do in this PR: if every attempt of making the migration fails, simply remove all images and containers and let the device redownload everything. (This is actually how we designed the migration, but not the observed behavior.)
We have seen some cases in which the file
/var/lib/docker/containers/<UUID>/config.v2.json
was not found for
certain UUIDs. This causes the whole migration to fail and, worse,
causes our migration cleanup code to fail.
A missing config.v2.json
indicates that directory does not contain a
valid container, so we should not even try to migrate them. That's what
this commit does: it makes the migration ignore directories with a
missing config.v2.json
.
I don't think we should have directories like this in the first place, this is probably the side effect of some other issue. That said, Docker itself ignores these directories (after logging a warning) during startup, so here are just bringing the Engine in line with the standard behavior.
Resolves #301
- What I did
- How I did it
- How to verify it
- Description for the changelog
An error occurred whilst building your landr site preview:
{
"name": "Error",
"message": "Command failed with code undefined: /usr/src/app/node_modules/gatsby/cli.js build",
"stack": "Error: Command failed with code undefined: /usr/src/app/node_modules/gatsby/cli.js build\n at Object.exports.run (/usr/src/app/lib/build-runner.js:257:11)\n at async build (/usr/src/app/bot/index.js:132:19)\n at async /usr/src/app/bot/index.js:210:25\n at async Promise.all (index 0)\n at async middleware (/usr/src/app/node_modules/@octokit/webhooks/dist-node/index.js:355:5)"
}
Totally untested as of now, therefore keeping PR as a draft.
[alexgg] This has attached https://jel.ly.fish/9aaf5f96-265c-4c52-9e59-074e973b53f8
I added a test and some error handling but not quite ready to merge this as my tests aren't working when I run individual tests.
The migrate_test.go tests pass when I run all the tests: make test-unit
But these all fail with package errors and GOROOT/GOPATH errors so I need to sort that out.
make test-unit TESTDIRS=pkg
make test-unit TESTDIRS=pkg/storagemigration
make test-unit TESTDIRS=pkg/storagemigration/migrate_test.go
Hey @zoobot ! I think you need to include the ./
prefix:
make test-unit TESTDIRS=./pkg/storagemigration
That . worked! :) thanks @lmbarros
Adding this here as reference for all the changes to make this happen: https://github.com/balena-os/balena-engine/pull/168/files
And the improvement: https://jel.ly.fish/product-improvement-migrate-device-storage-driver-from-aufs-to-overlayfs-22cf335f-b1f7-44f8-853d-064681c69fd8
Update on this @lmbarros :
Was hoping to get a rpi3 to test on, Ross is sending one. Also trying to set up QEMU to test this on for quicker results, sorting out m1 QEMU issues.
Making changes today: Don't just delete single container layer that has missing config.v2.json
On migration failure due to missing config.v2.json: Remove all overlay2 layers, aufs layers remain
@lmbarros I made the changes to test and cleanup so it removes all the overlay2 folders/containers on rollback, instead of just removing the single layer with the missing config.v2.json.
Should we wait until I have a RP3 device or QEMU to test this on or roll it out with the current go test? Do you always test on devices first?
Thanks for looking
Also there's a typo in the changelog entry: "Rollback to ausf..."
Should we wait until I have a RP3 device or QEMU to test this on or roll it out with the current go test? Do you always test on devices first?
I often do this because I am paranoid and like the ritual, but no, not necessarily. In fact, I think it's generally more important to have a good automated test case (either here or in meta-balena) capable of ensuring that we really fixed the issue.