buildkite-config icon indicating copy to clipboard operation
buildkite-config copied to clipboard

Don't pull compose images

Open byroot opened this issue 1 year ago • 11 comments

That pull take 30-40s per job, which is quite insane when these image rarely if ever change.

Of course if they significantly change it would cause problem, but then we should probably find another way to deal with this like running a specific tag that is committed in this repo and that we manually update once in a while.

Ref: https://github.com/rails/buildkite-config/pull/73#discussion_r1484091872

cc @zzak

byroot avatar Feb 09 '24 11:02 byroot

Doesn't seem to work that well, as now compose is pulling as part of up.

The doc suggest there is an option to disable that:

--pull	policy	Pull image before running ("always"|"missing"|"never")

I suspect we need missing here, but it's unclear what the default is... I also need to figure out how to hook that with the pipeline generation DSL.

byroot avatar Feb 09 '24 11:02 byroot

I suspect we need missing here,

Arf, https://docs.docker.com/compose/compose-file/05-services/#pull_policy suggest it's the default behavior. It's really weird I'd expect all nodes to already have these images in cache and not need to pull.

byroot avatar Feb 09 '24 11:02 byroot

Hum, wait:

docker-compose -f .buildkite/docker-compose.yml -p buildkite018d8da4653043ae9bbd12e7145077ce up -d --scale default=0 default
Creating network "buildkite018d8da4653043ae9bbd12e7145077ce_default" with the default driver
Pulling default (973266071021.dkr.ecr.us-east-1.amazonaws.com/builds:ruby-3-3-018d8da2-6d6b-4eb1-815f-4ce77a885048)...
ruby-3-3-018d8da2-6d6b-4eb1-815f-4ce77a885048: Pulling from builds

So each build has a different image for backing services? That seem wrong.

byroot avatar Feb 09 '24 11:02 byroot

So each build has a different image

Yes, that's the image created by the ruby:3.3 (etc) job, containing the current repo contents, the installed gems, and the built JS packages.

Jobs will not-infrequently be the first thing running on an instance, meaning they do need to pull everything, but the rest of the time they'll be building atop an already-cached set of layers and just pulling the final ref-specific layer(s).

matthewd avatar Feb 09 '24 14:02 matthewd

Right, I figured that a bit later, but that confused me because I wasn't expecting compose to deal with that (but again, I'm super unfamiliar with compose).

It seems like my PR works as intended, just maybe not quite as effective as I hoped?

What do you think?

byroot avatar Feb 09 '24 14:02 byroot

IIRC the explicit pull just separates the pull of the "build image" from the dependencies (postgres or whatever) into separate groups in the log output, so they get their time accounted separately. I don't think it adds time, so I think it's just a small cost (extra bit of config) for an even smaller benefit (extra log line) -- I'm not attached to it.

matthewd avatar Feb 09 '24 20:02 matthewd

I don't think it adds time

It depends. When you docker run foo if the foo tag exist locally, there's no pull. If you docker pull foo, it's going to query the registry, and potentially update the local version of foo.

But agreed that in most cases it should be that long, unless the image changed. I guess I overlooked that this pull included not only the "services" image but also the main "ruby" image, which changes way more frequently.

To really optimize this, we'd need to make these image change less often (e.g. use a digest of the Gemfile.lock instead of the build id as tag).

It's quite more involved.

byroot avatar Feb 11 '24 21:02 byroot

To really optimize this, we'd need to make these image change less often (e.g. use a digest of the Gemfile.lock instead of the build id as tag).

We do already use a lock-only layer. If we stopped the image there, we'd have to clone the repo on the workers, and last time I tried that seemed to come out slower (given we're semi-often building on a clean machine).

matthewd avatar Feb 12 '24 00:02 matthewd

Yeah, you're right that it depends on a lot on many factors such as this one.

Maybe it was just my builds at some point, but they seemed to all pull for 30-40s, I'm pretty sure we could clone faster than that. But I was more in search of a quick win, rather than willing to rearchitect the whole thing.

byroot avatar Feb 12 '24 00:02 byroot

Yeah, the docker pull definitely takes a lot longer than I'd expect.. but IIRC so does the git clone :confused:

(I do, as is my wont, have vague long-term plans to improve things here.)

matthewd avatar Feb 12 '24 00:02 matthewd

FYI: buildkite-plugins/docker-compose-buildkite-plugin#419

images will not be pulled by the plugin (docker should take care of that)

zzak avatar Feb 28 '24 06:02 zzak