mybinder.org-deploy
mybinder.org-deploy copied to clipboard
Pre-pull popular images on nodes to make mybinder.org faster
Right now, an image is pushed onto a node the first time something is launched there. This is a result of user action, and will cause things to be slow.
Instead, we should have a daemonset running on each node that will:
- Look at running user pods across the cluster
- Collect all the images used by all the running pods, along with a count of number of pods running them
- Figure out which of those images don't exist on the current node
- Apply some heuristics to figure out which images that don't exist on the current node are likely to end up on this node, and pull them.
This way, when the user launches an image there's a likelihood the image already exists on the node.
To begin with, we can just pull any image that isn't already on our node, starting at the most-used image and going down. We can then see what this gives us...
Kubelet should garbage collect images when it starts running out of space now, and we can also do more aggressive garbage collection if needed (we might already be doing it?)
Could we increase the weight in the scheduling decision to pick a node that already has the image present? Wondering if that would be a temporary fix to let us learn how much it would improve things before writing the daemonset. I am unsure how much we'd learn though :-/
I was looking at the number of launches for repos from the last ten days. This is the top 20:
I didn't (yet) compute how many percent of all launches these top 20 cover. The conclusion I took from this was that these repos change rarely and are present on all nodes in our cluster. Just by virtue of being so popular. This makes me think if we want to pre-pull images we need to dive into the "long tail"?
The super popular repos have a lot more "launches" than "builds". Maybe we should identify the repos that are rebuilt frequently and target those? In that context: when we build a new repo we have to push it to the registry and then pull it again, even if the scheduler picks the node the build ran on because the build happens in a separate dockerd. So as a "builder" you have to pay twice. Is there a way we could move images from the build dockerd to the host dockerd (or is it already shared?)
For completeness: over the last ten days the top20 repos are responsible for 80% of all launches.
There were 130866 launches in total over that time period and 104850 of these were for one of the top20 repos.