runtime-spec
runtime-spec copied to clipboard
Add `features.md` to formalize the `runc features` JSON
Add features.md and features-linux.md, to formalize the runc features JSON that was introduced in runc v1.1.0-rc1.
A runtime caller MAY use this JSON to detect the features implemented by the runtime.
The spec corresponds to https://github.com/opencontainers/runc/blob/v1.1.0-rc.1/types/features/features.go (opencontainers/runc#3296, opencontainers/runc#3310)
Differences from runc v1.1.0-rc.1
- Add
.linux.intelRdt.enabledfield - Add
.linux.cgroup.rdmafield
Updated PR.
Differences from runc v1.1.0-rc.1
- Add
.linux.intelRdt.enabledfield - Add
.linux.cgroup.rdmafield
@giuseppe Could you take a look from crun's perspective?
@giuseppe Could you take a look from crun's perspective?
from crun's perspective, it looks good.
What could also be useful for us, would have a way to know what log-driver is supported by a runtime (the --log-format option used by runc and crun), but that is not part of the OCI specs, so I am not sure if it could be added here?
What could also be useful for us, would have a way to know what log-driver is supported by a runtime (the --log-format option used by runc and crun), but that is not part of the OCI specs, so I am not sure if it could be added here?
That should be probably a custom Annotations field like "org.opencontainers.runc.log-formats":"[\"text\",\"json\"]".
That should be probably a custom
Annotationsfield like"org.opencontainers.runc.log-formats":"[\"text\",\"json\"]".
the problem with custom annotations is that it won't really help to understand if a generic OCI runtime $FOO has support for --log-driver and what options are available because you need to know the field in advance.
the problem with custom annotations is that it won't really help to understand if a generic OCI runtime $FOO has support for
--log-driverand what options are available because you need to know the field in advance.
or are you suggesting that other OCI runtimes would use the same "org.opencontainers.runc.log-formats" key?
@giuseppe
The runc CLI flags like --log-format are currently out of the scope of the OCI Specs, so they have to be defined as annotations under org.opencontainers.runc.* namespace.
Only runtimes with runc-compatible CLI should use these org.opencontainers.runc.* annotations.
We can add proper fields for log formats when the CLI behaviors get defined in the OCI Specs.
This can potentially eliminate podman's containers.conf(5), parameters runtime_supports_*. Currently they have
# List of the OCI runtimes that support --format=json. When json is supported
# engine will use it for reporting nicer errors.
#
#runtime_supports_json = ["crun", "runc", "kata", "runsc", "krun"]
# List of the OCI runtimes that supports running containers with KVM Separation.
#
#runtime_supports_kvm = ["kata", "krun"]
# List of the OCI runtimes that supports running containers without cgroups.
#
#runtime_supports_nocgroups = ["crun", "krun"]
As pointed out by @giuseppe above, runc (and crun) can add an annotation telling that --log-format is supported.
Something similar could be done with other properties from the above config.
@opencontainers/runtime-spec-maintainers
Could you take a look? Is there any remaining task?
@opencontainers/runtime-spec-maintainers Can this be merged? 🙏
@kolyshkin @thaJeztah Could you take a look? This has been already implemented in runc since v1.1.0-rc1.
Updated the PR
Differences from runc v1.1.0
- Add
.linux.intelRdt.enabledfield - Add
.linux.cgroup.rdmafield - 🆕 Add
.linux.seccomp.knownFlagsand.linux.seccomp.supportedFlagsfields (Implemented in opencontainers/runc#3588)
cc @kolyshkin for the seccomp changes
Let's merge this after releasing v1.1.0-rc.1: https://github.com/opencontainers/runtime-spec/pull/1175
I think this PR can be included in v1.1.0-rc.2, but if controversial it can be postponed to v1.2.0.
Rebased.
Do you have an example of how you'd expect to see this used by a higher-level runtime like containerd or moby? Should they invoke this on every container creation? Once at their daemon startup and cache it for the lifetime? Periodically refresh their cache?
Do you have an example of how you'd expect to see this used by a higher-level runtime like containerd or moby? Should they invoke this on every container creation? Once at their daemon startup and cache it for the lifetime? Periodically refresh their cache?
I'd suggest invoking this on every container creation, but the implementation can opt-in to cache it too.
Isn't that a bit heavy just for a slightly better/earlier error message? (https://github.com/opencontainers/runtime-spec/pull/1130#discussion_r770457507)
Wouldn't it be better to update the requirements of something like Moby to a higher version of runc that does support rro and then just implement them, especially since that's a potential security issue? I'm sorry, but I still feel like I'm missing something here. :see_no_evil:
Isn't that a bit heavy just for a slightly better/earlier error message? (#1130 (comment))
"Check the error message and disable the unsupported feature" would work fine for just testing a single feature, but doesn't scale for checking multiple features.
Also, how do we propagate the machine-readable error message to the caller? Not sure we can just jsonify the stderr.
Wouldn't it be better to update the requirements of something like Moby to a higher version of
runcthat does supportrroand then just implement them, especially since that's a potential security issue? I'm sorry, but I still feel like I'm missing something here. 🙈
Checking the version number is fine if the runtime caller only supports runc, but doesn't work well for supporting crun, youki, etc.
Wouldn't most of these things be things that the user asked for explicitly, so erroring all the way up is the appropriate response? Do you have an example of a workflow where something like Moby or containerd would want to enable a feature, check this API, and then dynamically no longer want to enable that feature? Doesn't that create a bit of an unexpected/inconsistent user experience?
Wouldn't most of these things be things that the user asked for explicitly, so erroring all the way up is the appropriate response? Do you have an example of a workflow where something like Moby or containerd would want to enable a feature, check this API, and then dynamically no longer want to enable that feature? Doesn't that create a bit of an unexpected/inconsistent user experience?
Moby, containerd, etc. already have this sort of inconsistency; e.g., they conditionally enable Apparmor depending on its availability on the host. Same is highly likely to happen for other LSMs in future, such as Landlocks, when they are supported by the kernel and the OCI runtime.
Also, these high-level container engines may potentially have a feature to automatically translate ro mounts to rro to harden the "readonly" mounts, when it is supported by the kernel and the OCI runtime.
The features should be also useful for calling VM-based OCI runtimes. https://github.com/opencontainers/runtime-spec/blob/v1.1.0-rc.1/config-vm.md
The high-level container engine may conditionally choose the VM disk image file, depending on the disk formats (raw, qcow2, etc.) and the firmwares (BIOS, UEFI) supported by the OCI runtime.
(The current PR does not cover the VM-based runtimes though. Can be covered in a separate PR).
This KEP may also depend on the runc features command:
- https://github.com/kubernetes/enhancements/pull/3858#discussion_r1100869332
@opencontainers/runtime-spec-maintainers
We have 3 LGTMs. Can we merge this? Also, let's release the next RC right after deciding to merge this in v1.1 or not.
In case this is blocked on my comments (:heart:): I still don't feel like I actually understand fully what the use case here is (and why they're worth potentially invoking runc features for every container startup), but clearly several other maintainers do and I don't feel strongly enough to block it. :bow:
what the use case here
The runtime features are expected to be eventually propagated to Kubernetes node annotations (in Kube's own format) so that kube-scheduler can avoid using nodes that lack the features required by the Pods.
e.g., when a Pod is launched with recursivelyReadOnly: Required (proposed in https://github.com/kubernetes/enhancements/pull/3858), the scheduler should avoid using nodes that lacks support for rro mount types.
why they're worth potentially invoking runc features for every container startup
No need to invoke runc features for every container startup.
High-level engines can just cache it, but they may face an inconsistency issue when the cache is not invalidated on upgrading the runc binary.