cortex
cortex copied to clipboard
Consider adding in-cluster image caching layer
The primary motivation for this would be performance and reducing load on an external image registry. One thing to test before investing in it is just how much faster would it be vs ECR in the same region.
We would need to make sure to do it in a way to not jeopardize reliability, so perhaps we can add a new node for this (which if it goes down, we can fall back on the remote registry).
Also related: we looked into supporting specifying backup image registries (#1995), and ran into a bit of a roadblock, but perhaps it’s gotten more feasible since then.
It would be very interesting to see if this is indeed faster. We use fairly large images (average around 16GB) and the network overhead takes a heavy load on startup
@creatorrr just a passing thought: depending on whether you run Python code in your container or not, you might be able to reduce the size of your image considerably with a tool like https://github.com/google/subpar. Haven't used it yet, just stumbled upon it a few days ago. 16GB is a heck of a lot!
Yup yup. Basically because we bundle ML models within the image itself so it’s not the runtime that’s causing the bloat (although that could use some optimisation too). We have tried using better compression but that didn’t yield significant benefits.
By the way, what’s the best way to measure time spent by the service in different stages during startup? Just looking at the logs?