snapcraft icon indicating copy to clipboard operation
snapcraft copied to clipboard

Using after leads to pull after instead of waiting for complete part

Open lissyx opened this issue 2 years ago • 13 comments

Bug Description

Having two parts, part1 and part2. In part2 declare after: part1. From reading https://snapcraft.io/docs/parts-lifecycle#heading--step-dependencies

By default, snapcraft runs the same lifecycle step of all parts before moving to the next step. However, you can change this behavior using the after keyword in the definition of a part in snapcraft.yaml. This creates a dependency chain from one part to another.

One would expect that snapcraft completes completely part1 before doing part2. This is what was happening "before" (maybe snapcraft 4.8 ?).

To Reproduce

snap run snapcraft

Environment

repros on launchpad ; snapcraft running on ubuntu 22.04 in destructive mode ; snapcraft running on debian sid with lxd

snapcraft.yaml

debug-symbols:
    plugin: nil
    build-packages:
      - python3
      - python3-virtualenv
    after:
      - firefox
    override-pull: |
      export SYMBOLS_ARCHIVE=$(find $SNAPCRAFT_STAGE/debug-symbols/ -type f -name "firefox-*.crashreporter-symbols.zip")
      if [ -f "$SYMBOLS_ARCHIVE" ]; then
        if [ -f "$SNAPCRAFT_PROJECT_DIR/symbols-upload-token" ]; then
          virtualenv venv/
          source venv/bin/activate
          venv/bin/pip3 install redo requests argparse
          SOCORRO_SYMBOL_UPLOAD_URL=https://symbols.stage.mozaws.net/upload/ SOCORRO_SYMBOL_UPLOAD_TOKEN_FILE="$SNAPCRAFT_PROJECT_DIR/symbols-upload-token" venv/bin/python3 $SNAPCRAFT_STAGE/debug-symbols/upload_symbols.py $SYMBOLS_ARCHIVE
          rm "$SNAPCRAFT_PROJECT_DIR/symbols-upload-token"
          deactivate
        else
          cp $SYMBOLS_ARCHIVE $SNAPCRAFT_PROJECT_DIR/$SNAPCRAFT_PROJECT_NAME_$SNAPCRAFT_PROJECT_VERSION_$SNAPCRAFT_TARGET_ARCH.debug
        fi
      fi

Relevant log output

While testing with Firefox based on Core 22 (requiring newer snapcraft), logging shows clearly that this is not respected:

  39137 Executed: pull firefox                                                          
  39138 Executing parts lifecycle: pull debug-symbols                                   

[...]

  54552 :: + cp obj-x86_64-pc-linux-gnu/dist/firefox-114.0.1.en-US.linux-x86_64.crashreporter-symbols.zip /build/firefox/stage/debug-symbols/


### Additional context

We need to build the debug symbols before copying them.

lissyx avatar Jun 19 '23 16:06 lissyx

So either the doc is misleading and it is expected that part2.pull runs just after part1.pull and not after complete execution of all steps of part1, or it has regressed

lissyx avatar Jun 19 '23 16:06 lissyx

I think the wording of that help section must be clearer.

Each lifecycle step depends on the completion of the previous step for that part, so to reach a desired step, all prior steps need to have successfully run. By default, snapcraft runs the same lifecycle step of all parts before moving to the next step. However, you can change this behavior using the after keyword in the definition of a part in snapcraft.yaml. This creates a dependency chain from one part to another.

The part in bold is lacking accuracy. OK, you introduce the keyword and the behavior describe changes. But how exactly? What exactly does creating a dependency chain mean?

At this point anyone that needs to learn how this works probably concludes that after makes "all the lifecycle steps of X be executed before any lifecycle step of Y ever begins", but as observed by lissyx, this is probably not the case.

The true answer seems to be implied in the next paragraph.

In the above example, the part named grv will be built after the part named libgit2 has been successfully built and staged.

But as a reader I don't know if, for instance, "pulled" was suppressed by oversight or intentionally.

So I believe the first quote must say something like

By default, snapcraft runs the same lifecycle step of all parts before moving to the next step. However, if you use after: X in the part Y, the build and stage steps (this doesn't include pull or prime) of X must be completed before the build and stage steps of Y are executed.

nteodosio avatar Jun 19 '23 16:06 nteodosio

It's even more disturbing since it was working before :(

lissyx avatar Jun 19 '23 18:06 lissyx

This change was done in core22, mostly driven by large sources being built on launchpad, in core22, switching to core20 would bring back the original behavior.

sergiusens avatar Jun 19 '23 21:06 sergiusens

@cmatsuoka can you look into this?

sergiusens avatar Jun 19 '23 21:06 sergiusens

This change was done in core22, mostly driven by large sources being built on launchpad, in core22, switching to core20 would bring back the original behavior.

Do you have a link to why / where this was done ? I worry that we also had to use override-pull for some network-related reasons on GitHub Actions builds.

lissyx avatar Jun 20 '23 04:06 lissyx

One motivation for this change was to isolate the pull step to allow offline builds, in the sense that everything could be pulled first, and the rest of the package construction could be carried without network access. One possible solution for this case could be to upload symbols in the build step of debug-symbols, which would be executed after the stage step of firefox.

cmatsuoka avatar Jun 20 '23 14:06 cmatsuoka

One motivation for this change was to isolate the pull step to allow offline builds, in the sense that everything could be pulled first, and the rest of the package construction could be carried without network access. One possible solution for this case could be to upload symbols in the build step of debug-symbols, which would be executed after the stage step of firefox.

Right, but you just documented the rationale why we did it on pull: network access is sure to be enabled at that point.

lissyx avatar Jun 20 '23 14:06 lissyx

Are you planning to disable networking after pull in your build environment? Otherwise it should keep working (the idea of offline builds is to allow building without networking if necessary, not making it mandatory).

cmatsuoka avatar Jun 20 '23 14:06 cmatsuoka

Are you planning to disable networking after pull in your build environment? Otherwise it should keep working (the idea of offline builds is to allow building without networking if necessary, not making it mandatory).

I am not in control of what happens on launchpad builds, @seb128 might know more.

lissyx avatar Jun 20 '23 14:06 lissyx

So there's a 3h timeout on launchpad, but that's orthogonal to the build step in snapcraft, and we only care for this on GitHub Actions in fact.

Clarification of the documentation might still be a good thing.

lissyx avatar Jun 20 '23 15:06 lissyx

I have a different use case. I build a Debian package in one part and consume it in another. With the new behavior, snapcraft complains that the debian package is not found. This is a re-written recipe to reproduce my case.

  my-part:
    after:
      - my-part2
    source: $CRAFT_STAGE/distro-info-data.deb
    source-type: deb
    plugin: nil

  my-part2:
    plugin: nil
    override-pull: |
      apt download distro-info-data
    override-build: |
      mv distro-info-data_*.deb $CRAFT_PART_INSTALL/distro-info-data.deb
      ls -l $CRAFT_PART_INSTALL
    prime:
      - -*

I can combine both parts, but prefer to separate them.

tsunghanliu avatar Jul 07 '23 13:07 tsunghanliu

This behavior is likely to explain why we are missing some debug symbols on builds: firefox.override-pull can be executed before mozconfig.override-stage, this overwriting the .mozconfig file

lissyx avatar Sep 19 '23 15:09 lissyx

We've rewritten our documentation to describe this behavior (pull runs on all parts before further steps).

mr-cal avatar Apr 24 '25 14:04 mr-cal