sbt-native-packager Random LABEL snp-multi-stage-id invalidates docker cache

Expected behaviour

I'm currently writing second post about docker cache efficiency on SBT. My main focus is to use docker cache in CI environments as much as possible. Already managed great improvements but there is problem with random generated Labels.

LABEL snp-multi-stage="intermediate"
LABEL snp-multi-stage-id="44857d33-aef2-4d80-b811-7d1ed9b1891d"

Executing second command always invalidates cache which is not always expected. Especially when dockerAutoremoveMultiStageIntermediateImages := false is used. I suggest to use something deterministic like stage0-(packageName in Docker).value

Actual behaviour

Step 1/20 : FROM repo.mycompany.com/team/openjre:8u242 as stage0
[info]  ---> a020ce624573
[info] Step 2/20 : LABEL snp-multi-stage="intermediate"
[info]  ---> Using cache
[info]  ---> 9abeed0b5a9f
[info] Step 3/20 : LABEL snp-multi-stage-id="44857d33-aef2-4d80-b811-7d1ed9b1891d"
[info]  ---> Running in 072dfdce55ad
[info] Removing intermediate container 072dfdce55ad
[info]  ---> 554d25368747
[info] Step 4/20 : WORKDIR /opt/my-app
[info]  ---> Running in 49f3e275dc8c

As you can see cache works with the first label, but gets invalidated after second, random label. @mkurz What do you think about deterministic label?

Apr 05 '20 11:04 ppiotrow

@ppiotrow I think I can live with a deterministic label. Actually there was a discussion already if we should use a random id (like we do now) or something more deterministic. Please have a look the comment here and also my answer. As you can see my main argument was that I wanto to avoid any side effects if possible. E.g. creating an image fails and a user may want to inspect it later, however if you now run another build and that succeeds, with a deterministic label, it will also delete the previous build image, which we wanted to keep actually. However, I think it would be a compromise to switch to a deterministic label for caching purposes if the win is much higher, performance and disc space wise. WDYT? Will it be worth it?

Apr 05 '20 20:04 mkurz

I like the existing idea to have two layers: snp-multi-stage to wipe out all intermediate layers from sbt docker builds and second snp-multi-stage-id to handle only build specific image. I don't really follow the argument of inspecting image later, but this is influenced by my environment. I usually run builds in docker in docker CI servers. Unpushed images are just gone to me. But if I run build locally, I'd inspect it just after it fails.

The caching capabilities, having reproducible (non random) builds, simpler unit tests is better from my point of view. I'd like to learn someone else with different CI setup opinion.

Apr 06 '20 07:04 ppiotrow

@ppiotrow Let's just change snp-multi-stage-id to something deterministic. I am fine with that. However I will not do that work, too busy right now.

Apr 06 '20 07:04 mkurz

If you can live without those labels, it's possible to simply remove then as a workaround

dockerCommands := dockerCommands.value.filter {
  case Cmd("LABEL", args @ _*) => args.head.startsWith("snp-multi-stage")
  case _                       => true
}

Aug 13 '20 10:08 stoiev

If you can live without those labels, it's possible to simply remove then as a workaround
dockerCommands := dockerCommands.value.filter {
  case Cmd("LABEL", args @ _*) => args.head.startsWith("snp-multi-stage")
  case _                       => true
}

Thanks for the workaround! There's just a negation missing, it should be: case Cmd("LABEL", args @ _*) => !args.head.startsWith("snp-multi-stage")

In general I believe a deterministic id should be the default. More users are concerned with a fast build compared to ones inspecting their failed builds.

Aug 17 '21 11:08 an-tex

sbt-native-packager sbt-native-packager copied to clipboard

Random LABEL snp-multi-stage-id invalidates docker cache

Expected behaviour

Actual behaviour

sbt-native-packager
sbt-native-packager copied to clipboard