spegel
spegel copied to clipboard
Support for Bottlerocket
Currently Spegel does not work with Bottlerocket when run with EKS.
https://github.com/bottlerocket-os/bottlerocket/blob/develop/README.md
This needs to be explored further to see if it is possible to support.
After having a quick look at the Containerd configuration generated by Bottlerocket it seems like it does not currently support Spegel.
https://github.com/bottlerocket-os/bottlerocket/blob/7b67f9118151e67ac1e0972ef9d48eb3487c4ce9/packages/containerd/containerd-config-toml_k8s_containerd_sock#L47-L52
The only mirror configuration supported currently is the old syntax. There is an issue to track implementing the new syntax. It needs to be closed before Bottlerocket can be supported. bottlerocket-os/bottlerocket/issues/1963
I was about to test Spegel when I found this issue. Is bottlerocket team aware of this kind of limitation?
Thank you
@mballoni I don't know if there is any activity on the issue as it has been around since last year. I do not have time right now myself to contribute to the project so it depends on if anyone else prioritizes getting this done.
Is it possible to create a flag to disable the automatic node configuration, along with some docs for how to configure containerd on the node to make it work?
I could very easily tweak the user data on my bottlerocket nodes to have the right settings to work with spegel, I think -- the main issue is that spegel will crash because it tries to modify the settings itself and can't.
@empath-nirvana If you are talking about spegel writing containerd mirror configuration, it is actually documented, if not as visible as one might wish. Have a looke at https://github.com/spegel-org/spegel/tree/main/charts/spegel, specifically spegel.containerdMirrorAdd
.
If you succeed with getting Bottlerocket nodes to work, we'd love an update to ./docs/COMPATIBILITY.md with instructions.
Do you have some documentation on what the actual containerd configuration is supposed to be on the node? That might be useful for other OS's besides bottlerocket if you don't want spegel to configure it for you. I don't need bottlerocket-specific configuration, I can translate it to bottlerocket settings.
@empath-nirvana Spegel has no documentation about the exact configuration generated. This is because Spegel currently targets specific scenarios (see ./docs/COMPATIBILITY.md) and Spegel is supposed to "just work" in those scenarios. Currently, the mirror config injection works something like this:
The configuration is generated by the configuration subcommand. The actual configuration generated is currently simple. It takes the list provided via spegel.registires and turns them into containerd hosts.yaml files. If you run Kind and Spegel locally (see PR #458), you can inspect the generated config:
$ docker exec kind-worker cat /etc/containerd/certs.d/ghcr.io/hosts.toml
server = 'https://ghcr.io'
[host]
[host.'http://172.19.0.4:30020']
capabilities = ['pull', 'resolve']
[host.'http://172.19.0.4:30021']
capabilities = ['pull', 'resolve']
Sorry to be a nuisance about this, but it's throwing an error message:
{"time":"2024-05-06T13:32:41.843433252Z","level":"ERROR","source":{"function":"main.main","file":"/build/main.go","line":88},"msg":"run exit with error","err":"Containerd registry config path needs to be set for mirror configuration to take effect"}
https://github.com/spegel-org/spegel/blob/490c549a3935f5dceb4a1c23b15eeb22a8ed707f/pkg/oci/containerd.go#L104-L137
Now, I haven't done any configuration on the bottlerocket node to add the mirror configuration, but it seems like that really shouldn't be throwing an error even if i haven't?
the config path is set in the args: - '--containerd-registry-config-path=/etc/containerd/certs.d'
I'm sort of digging through the code, but it looks like there's a lot more than just the initial setup that depends on that file existing, in which case it's not going to be a small amount of work to make bottlerocket support it...
I commented out the verification lines that were throwing errors and rebuilt the container, and changed the helm chart so it binds to 127.0.0.1 instead of the node ip.. and set the bottle rocket config like this:
[[settings.container-registry.mirrors]] "registry" = "gcr.io" "endpoint" = ["127.0.0.1:30020", "127.0.0.1:30021"]
And i see messages like this in the logs:
"handling mirror request from external node"
So I assume that means it did work?
@empath-nirvana It seems you have it working then (though at this point it is hard to be completely sure since there currently is no way to see that a particular image arrived via spegel; see #24). Just so I understand fully, did you retain the check for "DiscardUnpackedLayers"?
@phillebaba I'm thinking the current Helm parameter spegel.containerdMirrorAdd
should be repurposed (renamed?) so that it instructs the registry command to skip the config path check? The current implementation is somewhat odd in that it takes the config path parameter, but the only thing it is used for is matching against containerd's reported config dir. Were you planning to use it for something else?
If we go down this path, we are indirectly saying we should support containerds with the older config format. Is that problematic?
No the check should be run no matter if the user wants Spegel to add the mirror configuration or not. Otherwise Spegel would start and people would think that things are working when they are not at all. We will never support the deprecated mirror configuration method. It has its limitations one of them being that the configuration is not hot reloaded. The second that it will be removed from Containerd.
It is important to understand the difference between the config path configuration and mirror configuration. The config path tells Containerd to look in a directory or directories for mirror configuration to load before pulling an image. This configuration is not hot reloaded and requires a restart of Containerd to take effect. The mirror configuration exists to tell Containerd about mirrors to try before attempting to pull the image from the original source. This configuration is hot reloaded as it is updated every time Containerd pulls an image.
The issue with Bottlerocket is that they have chosen not to update the Containerd configuration yet. The issue has been open for a while without much activity. While the work is not technically complicated to change, it may require some effort to figure out a migration path for existing users. I am not going to do this work as I am not a Bottlerocket user, so unless someone else will step up this issue will be blocked.
I'm not sure why hot reloading the containerd config is necessary if you have the correct configuration in the bottlerocket node when booting it?
Wouldn't it be necessary every time a new node arrives or gets removed from the groups?
In Bottlerockets case it would not be an issue as the configuration seems to be global and would be applied to all new nodes. Either way it is not something that I am going to support as the old configuration method is deprecated, and I do not have time debugging issues related to this. It would be a lot more valuable is Bottlerocket simply allowed for enabling the new configuration method instead. Or actually fixed it's mirror configuration to use the new method.
If you don't want to support it bottle rocket, that's fine. What I'm really asking for here is:
- Documentation for what the containerd configuration needs to be (whether it's hot reloaded or not)
- Documentation for how you can tell if it's actually working, which i believe there's already an open issue on.
Bottlerocket aside, there are going to be people who run on nodes where they won't have permissions to tweak containerd settings from a pod or don't want to, but may be able to update their node configuration in other ways.
The required Containerd configuration is already documented in the Compatibility docs. All Containerd system configuration requires a restart for changes to apply.
https://github.com/spegel-org/spegel/blob/main/docs/COMPATIBILITY.md#compatibility
I have not come up with a good method of determining if Spegel is working or not. It is a best effort distributed cache, there are a lot of moving parts. A solution to run after install would be the easiest but would not verify that things are working later on. A continuously running check would be better but also be a lot more complicated. Maybe starting with a run once check would solve a lot of issues for initial debugging. Happy to take contributions if you have a simple and easy method to solve this.
Well if they are able to update their node configuration in other ways it is possible to disable Spegel from adding the mirror configuration. The only thing missing is to documented how the mirror configuration should look like, which would be an improvement to the docs. Spegel will however require that Containerd is properly configured, and restarted. How that is done is up to the user. Most cloud providers will add this configuration by default while others wont. If that configuration is not possible due to the provider, it will most likely be out of scope for Spegel to fix.
@phillebaba Just so I understand this clearly- does the solution here https://github.com/spegel-org/spegel/blob/main/docs/COMPATIBILITY.md#eks not apply to bottlerocket nodes? After reading a few different issues on the bottlerocket-os project, I'm confused if the situation is:
- Spegel doesn't function out-of-the-box on bottlerocket or
- Spegel doesn't function at all on bottlerocket.
Spegel does not work at all with Bottlerocket as far as I know, if the information in the issue https://github.com/bottlerocket-os/bottlerocket/issues/1963 still holds true.
Currently Bottlerocket only uses the old mirror configuration for Containerd, which is not compatible with Spegel because it lacks certain features. Which is why I haven't spent time trying to figure out how to add compatibility. This old syntax will be removed in Containerd v2 so Bottlerocket should really update to the new syntax before it becomes a blocker to update Containerd version.
Sadly I am not a user of Bottlerocket so I don't really have the motivation to spend time to fix this. Technically it is not to difficult to fix the configuration, the challenge is documenting migration for existing users and verifying it wont break exiting use cases.
I did actually get this to work on bottlerocket by forking spegel and removing all the automatic containerd configuration and then updating the user data on the bottlerocket nodes to setup the mirrors to use spegel. So theoretically it's possible to get it to work, but it's not supported right now.