Using after leads to pull after instead of waiting for complete part
Bug Description
Having two parts, part1 and part2. In part2 declare after: part1. From reading https://snapcraft.io/docs/parts-lifecycle#heading--step-dependencies
By default, snapcraft runs the same lifecycle step of all parts before moving to the next step. However, you can change this behavior using the after keyword in the definition of a part in snapcraft.yaml. This creates a dependency chain from one part to another.
One would expect that snapcraft completes completely part1 before doing part2. This is what was happening "before" (maybe snapcraft 4.8 ?).
To Reproduce
snap run snapcraft
Environment
repros on launchpad ; snapcraft running on ubuntu 22.04 in destructive mode ; snapcraft running on debian sid with lxd
snapcraft.yaml
debug-symbols:
plugin: nil
build-packages:
- python3
- python3-virtualenv
after:
- firefox
override-pull: |
export SYMBOLS_ARCHIVE=$(find $SNAPCRAFT_STAGE/debug-symbols/ -type f -name "firefox-*.crashreporter-symbols.zip")
if [ -f "$SYMBOLS_ARCHIVE" ]; then
if [ -f "$SNAPCRAFT_PROJECT_DIR/symbols-upload-token" ]; then
virtualenv venv/
source venv/bin/activate
venv/bin/pip3 install redo requests argparse
SOCORRO_SYMBOL_UPLOAD_URL=https://symbols.stage.mozaws.net/upload/ SOCORRO_SYMBOL_UPLOAD_TOKEN_FILE="$SNAPCRAFT_PROJECT_DIR/symbols-upload-token" venv/bin/python3 $SNAPCRAFT_STAGE/debug-symbols/upload_symbols.py $SYMBOLS_ARCHIVE
rm "$SNAPCRAFT_PROJECT_DIR/symbols-upload-token"
deactivate
else
cp $SYMBOLS_ARCHIVE $SNAPCRAFT_PROJECT_DIR/$SNAPCRAFT_PROJECT_NAME_$SNAPCRAFT_PROJECT_VERSION_$SNAPCRAFT_TARGET_ARCH.debug
fi
fi
Relevant log output
While testing with Firefox based on Core 22 (requiring newer snapcraft), logging shows clearly that this is not respected:
39137 Executed: pull firefox
39138 Executing parts lifecycle: pull debug-symbols
[...]
54552 :: + cp obj-x86_64-pc-linux-gnu/dist/firefox-114.0.1.en-US.linux-x86_64.crashreporter-symbols.zip /build/firefox/stage/debug-symbols/
### Additional context
We need to build the debug symbols before copying them.
So either the doc is misleading and it is expected that part2.pull runs just after part1.pull and not after complete execution of all steps of part1, or it has regressed
I think the wording of that help section must be clearer.
Each lifecycle step depends on the completion of the previous step for that part, so to reach a desired step, all prior steps need to have successfully run. By default, snapcraft runs the same lifecycle step of all parts before moving to the next step. However, you can change this behavior using the after keyword in the definition of a part in snapcraft.yaml. This creates a dependency chain from one part to another.
The part in bold is lacking accuracy. OK, you introduce the keyword and the behavior describe changes. But how exactly? What exactly does creating a dependency chain mean?
At this point anyone that needs to learn how this works probably concludes that after makes "all the lifecycle steps of X be executed before any lifecycle step of Y ever begins", but as observed by lissyx, this is probably not the case.
The true answer seems to be implied in the next paragraph.
In the above example, the part named grv will be built after the part named libgit2 has been successfully built and staged.
But as a reader I don't know if, for instance, "pulled" was suppressed by oversight or intentionally.
So I believe the first quote must say something like
By default, snapcraft runs the same lifecycle step of all parts before moving to the next step. However, if you use
after: Xin the part Y, the build and stage steps (this doesn't include pull or prime) of X must be completed before the build and stage steps of Y are executed.
It's even more disturbing since it was working before :(
This change was done in core22, mostly driven by large sources being built on launchpad, in core22, switching to core20 would bring back the original behavior.
@cmatsuoka can you look into this?
This change was done in core22, mostly driven by large sources being built on launchpad, in core22, switching to core20 would bring back the original behavior.
Do you have a link to why / where this was done ? I worry that we also had to use override-pull for some network-related reasons on GitHub Actions builds.
One motivation for this change was to isolate the pull step to allow offline builds, in the sense that everything could be pulled first, and the rest of the package construction could be carried without network access. One possible solution for this case could be to upload symbols in the build step of debug-symbols, which would be executed after the stage step of firefox.
One motivation for this change was to isolate the pull step to allow offline builds, in the sense that everything could be pulled first, and the rest of the package construction could be carried without network access. One possible solution for this case could be to upload symbols in the build step of
debug-symbols, which would be executed after the stage step offirefox.
Right, but you just documented the rationale why we did it on pull: network access is sure to be enabled at that point.
Are you planning to disable networking after pull in your build environment? Otherwise it should keep working (the idea of offline builds is to allow building without networking if necessary, not making it mandatory).
Are you planning to disable networking after pull in your build environment? Otherwise it should keep working (the idea of offline builds is to allow building without networking if necessary, not making it mandatory).
I am not in control of what happens on launchpad builds, @seb128 might know more.
So there's a 3h timeout on launchpad, but that's orthogonal to the build step in snapcraft, and we only care for this on GitHub Actions in fact.
Clarification of the documentation might still be a good thing.
I have a different use case. I build a Debian package in one part and consume it in another. With the new behavior, snapcraft complains that the debian package is not found. This is a re-written recipe to reproduce my case.
my-part:
after:
- my-part2
source: $CRAFT_STAGE/distro-info-data.deb
source-type: deb
plugin: nil
my-part2:
plugin: nil
override-pull: |
apt download distro-info-data
override-build: |
mv distro-info-data_*.deb $CRAFT_PART_INSTALL/distro-info-data.deb
ls -l $CRAFT_PART_INSTALL
prime:
- -*
I can combine both parts, but prefer to separate them.
This behavior is likely to explain why we are missing some debug symbols on builds:
firefox.override-pull can be executed before mozconfig.override-stage, this overwriting the .mozconfig file
We've rewritten our documentation to describe this behavior (pull runs on all parts before further steps).