talos
talos copied to clipboard
feat: implement distributed image store
This is PoC/experiment.
Each Talos node runs registryd component which acts both as a registry and a fan-out service. For local requests, registryd serves manifests/blobs from the containerd content storage. For incoming requests, registryd fans out requests to other nodes (cluster members), finding the first one which has the content.
I had to disable content store deduplication, as otherwise containerd drops original layers immediately.
One not fully solved question is how to inject registryd, what I did in my testing is to inject it as the endpoint in the registry mirror scheme, so if registryd has nothing, containerd falls back to "upstream" registry/mirror. There needs some work to be done to support it for * redirects.
There is unresolved issues with images protected by authorization. At the moment registryd never resolves tags (defers it to the upstream registry), but still it might deliver images without pull secrets given the proper digest.
How to secure registryd from access outside of the cluster?
This requires something like in the machine config (first endpoint is registryd, second is my registry mirrors, unrelated):
ghcr.io:
# List of endpoints (URLs) for registry mirrors to use.
endpoints:
- http://127.0.0.1:3172
- http://172.20.0.1:5004
Of course final solution should be opt-in, configurable with a single flag:
- reconfigure CRI not to drop image layers from content store
- reconfigure mirror endpoints to inject "registryd" endpoint as the first one
Open questions:
- images with pull credentials - need to dig more into that... how the auth is applied to blobs/manifests, what if the layer is shared?
- securing access to registryd from outside of the cluster
- reconfigure mirror endpoints to inject "registryd" endpoint as the first one
I just found this and had a few thoughts, how would this work if the config has something like this?
docker.io:
overridePath: true
endpoints:
- https://harbor.example.com/v2/dockerhub/
And don't mirror endpoints require listing every single source separately?
@ruifung I don't think I got your question, but this is early PoC, not real implementation yet, so some details are not known
- reconfigure mirror endpoints to inject "registryd" endpoint as the first one
I just found this and had a few thoughts, how would this work if the config has something like this?
docker.io: overridePath: true endpoints: - https://harbor.example.com/v2/dockerhub/And don't mirror endpoints require listing every single source separately?
Afaik we can just use this as a forward for the image store?
@smira I'm not sure if you saw this project or not but it works great on Talos. It seems like what you want to do here, maybe you'll find some ideas looking thru the source.
https://github.com/XenitAB/spegel
@smira I'm not sure if you saw this project or not but it works great on Talos. It seems like what you want to do here, maybe you'll find some ideas looking thru the source.
https://github.com/XenitAB/spegel
Thats great software actually, thanks for the tip!
@smira I'm not sure if you saw this project or not but it works great on Talos. It seems like what you want to do here, maybe you'll find some ideas looking thru the source.
https://github.com/XenitAB/spegel
yes, this was the inspiration, but probably more stuff we could do easier, but this is not done yet
@smira is there something specific in Spegel that you do not want, which is causing you to implement your own embedded registry?
@smira is there something specific in Spegel that you do not want, which is causing you to implement your own embedded registry?
it's not that Spegel has anything wrong, but rather it's a generic solution, while on Talos Linux we have more control and more information, e.g. we have the discovery data. So it should be easier to implement and run it on Talos.
Also it's our philosophy to keep things simple for the end users, just flip the switch and you get a distributed image cache.
I agree with you, my thought was that Talos could embed Spegel the same way k3s does. You don't even have to use the libp2p router if you have some other way of routing the traffic. Most components are interfaces so it should be pretty easy to just replace the router with a custom implementation.
I agree with you, my thought was that Talos could embed Spegel the same way k3s does. You don't even have to use the libp2p router if you have some other way of routing the traffic. Most components are interfaces so it should be pretty easy to just replace the router with a custom implementation.
I think this is a great suggestion, which is also easier to maintain.
@smira is there something specific in Spegel that you do not want, which is causing you to implement your own embedded registry?
it's not that Spegel has anything wrong, but rather it's a generic solution, while on Talos Linux we have more control and more information, e.g. we have the discovery data. So it should be easier to implement and run it on Talos.
Also it's our philosophy to keep things simple for the end users, just flip the switch and you get a distributed image cache.
I've done some implementation of Spegel now, and I have to say: It basically does precisely what you describe here... Its pretty much "apply and forget".
This PR is stale because it has been open 45 days with no activity.