sbt-native-packager icon indicating copy to clipboard operation
sbt-native-packager copied to clipboard

Random LABEL snp-multi-stage-id invalidates docker cache

Open ppiotrow opened this issue 4 years ago • 5 comments

Expected behaviour

I'm currently writing second post about docker cache efficiency on SBT. My main focus is to use docker cache in CI environments as much as possible. Already managed great improvements but there is problem with random generated Labels.

LABEL snp-multi-stage="intermediate"
LABEL snp-multi-stage-id="44857d33-aef2-4d80-b811-7d1ed9b1891d"

Executing second command always invalidates cache which is not always expected. Especially when dockerAutoremoveMultiStageIntermediateImages := false is used. I suggest to use something deterministic like stage0-(packageName in Docker).value

Actual behaviour

Step 1/20 : FROM repo.mycompany.com/team/openjre:8u242 as stage0
[info]  ---> a020ce624573
[info] Step 2/20 : LABEL snp-multi-stage="intermediate"
[info]  ---> Using cache
[info]  ---> 9abeed0b5a9f
[info] Step 3/20 : LABEL snp-multi-stage-id="44857d33-aef2-4d80-b811-7d1ed9b1891d"
[info]  ---> Running in 072dfdce55ad
[info] Removing intermediate container 072dfdce55ad
[info]  ---> 554d25368747
[info] Step 4/20 : WORKDIR /opt/my-app
[info]  ---> Running in 49f3e275dc8c

As you can see cache works with the first label, but gets invalidated after second, random label. @mkurz What do you think about deterministic label?

ppiotrow avatar Apr 05 '20 11:04 ppiotrow

@ppiotrow I think I can live with a deterministic label. Actually there was a discussion already if we should use a random id (like we do now) or something more deterministic. Please have a look the comment here and also my answer. As you can see my main argument was that I wanto to avoid any side effects if possible. E.g. creating an image fails and a user may want to inspect it later, however if you now run another build and that succeeds, with a deterministic label, it will also delete the previous build image, which we wanted to keep actually. However, I think it would be a compromise to switch to a deterministic label for caching purposes if the win is much higher, performance and disc space wise. WDYT? Will it be worth it?

mkurz avatar Apr 05 '20 20:04 mkurz

I like the existing idea to have two layers: snp-multi-stage to wipe out all intermediate layers from sbt docker builds and second snp-multi-stage-id to handle only build specific image. I don't really follow the argument of inspecting image later, but this is influenced by my environment. I usually run builds in docker in docker CI servers. Unpushed images are just gone to me. But if I run build locally, I'd inspect it just after it fails.

The caching capabilities, having reproducible (non random) builds, simpler unit tests is better from my point of view. I'd like to learn someone else with different CI setup opinion.

ppiotrow avatar Apr 06 '20 07:04 ppiotrow

@ppiotrow Let's just change snp-multi-stage-id to something deterministic. I am fine with that. However I will not do that work, too busy right now.

mkurz avatar Apr 06 '20 07:04 mkurz

If you can live without those labels, it's possible to simply remove then as a workaround

dockerCommands := dockerCommands.value.filter {
  case Cmd("LABEL", args @ _*) => args.head.startsWith("snp-multi-stage")
  case _                       => true
}

stoiev avatar Aug 13 '20 10:08 stoiev

If you can live without those labels, it's possible to simply remove then as a workaround

dockerCommands := dockerCommands.value.filter {
  case Cmd("LABEL", args @ _*) => args.head.startsWith("snp-multi-stage")
  case _                       => true
}

Thanks for the workaround! There's just a negation missing, it should be: case Cmd("LABEL", args @ _*) => !args.head.startsWith("snp-multi-stage")

In general I believe a deterministic id should be the default. More users are concerned with a fast build compared to ones inspecting their failed builds.

an-tex avatar Aug 17 '21 11:08 an-tex