runtime-spec Add `features.md` to formalize the `runc features` JSON

Add features.md and features-linux.md, to formalize the runc features JSON that was introduced in runc v1.1.0-rc1.

A runtime caller MAY use this JSON to detect the features implemented by the runtime.

The spec corresponds to https://github.com/opencontainers/runc/blob/v1.1.0-rc.1/types/features/features.go (opencontainers/runc#3296, opencontainers/runc#3310)

Differences from runc v1.1.0-rc.1

Add .linux.intelRdt.enabled field
Add .linux.cgroup.rdma field

Dec 09 '21 09:12 AkihiroSuda

Updated PR.

Differences from runc v1.1.0-rc.1

Add .linux.intelRdt.enabled field
Add .linux.cgroup.rdma field

Dec 16 '21 11:12 AkihiroSuda

@giuseppe Could you take a look from crun's perspective?

Jan 12 '22 05:01 AkihiroSuda

@giuseppe Could you take a look from crun's perspective?

from crun's perspective, it looks good.

What could also be useful for us, would have a way to know what log-driver is supported by a runtime (the --log-format option used by runc and crun), but that is not part of the OCI specs, so I am not sure if it could be added here?

Jan 12 '22 08:01 giuseppe

What could also be useful for us, would have a way to know what log-driver is supported by a runtime (the --log-format option used by runc and crun), but that is not part of the OCI specs, so I am not sure if it could be added here?

That should be probably a custom Annotations field like "org.opencontainers.runc.log-formats":"[\"text\",\"json\"]".

Jan 12 '22 08:01 AkihiroSuda

That should be probably a custom Annotations field like "org.opencontainers.runc.log-formats":"[\"text\",\"json\"]".

the problem with custom annotations is that it won't really help to understand if a generic OCI runtime $FOO has support for --log-driver and what options are available because you need to know the field in advance.

Jan 12 '22 10:01 giuseppe

the problem with custom annotations is that it won't really help to understand if a generic OCI runtime $FOO has support for --log-driver and what options are available because you need to know the field in advance.

or are you suggesting that other OCI runtimes would use the same "org.opencontainers.runc.log-formats" key?

Jan 12 '22 10:01 giuseppe

@giuseppe

The runc CLI flags like --log-format are currently out of the scope of the OCI Specs, so they have to be defined as annotations under org.opencontainers.runc.* namespace.

Only runtimes with runc-compatible CLI should use these org.opencontainers.runc.* annotations.

We can add proper fields for log formats when the CLI behaviors get defined in the OCI Specs.

Jan 12 '22 10:01 AkihiroSuda

This can potentially eliminate podman's containers.conf(5), parameters runtime_supports_*. Currently they have

# List of the OCI runtimes that support --format=json.  When json is supported
# engine will use it for reporting nicer errors.
#
#runtime_supports_json = ["crun", "runc", "kata", "runsc", "krun"]

# List of the OCI runtimes that supports running containers with KVM Separation.
#
#runtime_supports_kvm = ["kata", "krun"]

# List of the OCI runtimes that supports running containers without cgroups.
#
#runtime_supports_nocgroups = ["crun", "krun"]

As pointed out by @giuseppe above, runc (and crun) can add an annotation telling that --log-format is supported.

Something similar could be done with other properties from the above config.

Jan 28 '22 22:01 kolyshkin

@opencontainers/runtime-spec-maintainers

Could you take a look? Is there any remaining task?

Apr 01 '22 07:04 AkihiroSuda

@opencontainers/runtime-spec-maintainers Can this be merged? 🙏

May 09 '22 22:05 AkihiroSuda

@kolyshkin @thaJeztah Could you take a look? This has been already implemented in runc since v1.1.0-rc1.

Aug 16 '22 07:08 AkihiroSuda

Updated the PR

Differences from runc v1.1.0

Add .linux.intelRdt.enabled field
Add .linux.cgroup.rdma field
🆕 Add .linux.seccomp.knownFlags and .linux.seccomp.supportedFlags fields (Implemented in opencontainers/runc#3588)

cc @kolyshkin for the seccomp changes

Jan 25 '23 14:01 AkihiroSuda

Let's merge this after releasing v1.1.0-rc.1: https://github.com/opencontainers/runtime-spec/pull/1175

I think this PR can be included in v1.1.0-rc.2, but if controversial it can be postponed to v1.2.0.

Jan 26 '23 01:01 AkihiroSuda

Rebased.

Feb 01 '23 01:02 AkihiroSuda

Do you have an example of how you'd expect to see this used by a higher-level runtime like containerd or moby? Should they invoke this on every container creation? Once at their daemon startup and cache it for the lifetime? Periodically refresh their cache?

Feb 01 '23 05:02 tianon

Do you have an example of how you'd expect to see this used by a higher-level runtime like containerd or moby? Should they invoke this on every container creation? Once at their daemon startup and cache it for the lifetime? Periodically refresh their cache?

I'd suggest invoking this on every container creation, but the implementation can opt-in to cache it too.

Feb 01 '23 06:02 AkihiroSuda

Isn't that a bit heavy just for a slightly better/earlier error message? (https://github.com/opencontainers/runtime-spec/pull/1130#discussion_r770457507)

Wouldn't it be better to update the requirements of something like Moby to a higher version of runc that does support rro and then just implement them, especially since that's a potential security issue? I'm sorry, but I still feel like I'm missing something here. :see_no_evil:

Feb 01 '23 06:02 tianon

Isn't that a bit heavy just for a slightly better/earlier error message? (#1130 (comment))

"Check the error message and disable the unsupported feature" would work fine for just testing a single feature, but doesn't scale for checking multiple features.

Also, how do we propagate the machine-readable error message to the caller? Not sure we can just jsonify the stderr.

Wouldn't it be better to update the requirements of something like Moby to a higher version of runc that does support rro and then just implement them, especially since that's a potential security issue? I'm sorry, but I still feel like I'm missing something here. 🙈

Checking the version number is fine if the runtime caller only supports runc, but doesn't work well for supporting crun, youki, etc.

Feb 01 '23 06:02 AkihiroSuda

Wouldn't most of these things be things that the user asked for explicitly, so erroring all the way up is the appropriate response? Do you have an example of a workflow where something like Moby or containerd would want to enable a feature, check this API, and then dynamically no longer want to enable that feature? Doesn't that create a bit of an unexpected/inconsistent user experience?

Feb 01 '23 06:02 tianon

Wouldn't most of these things be things that the user asked for explicitly, so erroring all the way up is the appropriate response? Do you have an example of a workflow where something like Moby or containerd would want to enable a feature, check this API, and then dynamically no longer want to enable that feature? Doesn't that create a bit of an unexpected/inconsistent user experience?

Moby, containerd, etc. already have this sort of inconsistency; e.g., they conditionally enable Apparmor depending on its availability on the host. Same is highly likely to happen for other LSMs in future, such as Landlocks, when they are supported by the kernel and the OCI runtime.

Also, these high-level container engines may potentially have a feature to automatically translate ro mounts to rro to harden the "readonly" mounts, when it is supported by the kernel and the OCI runtime.

Feb 01 '23 09:02 AkihiroSuda

The features should be also useful for calling VM-based OCI runtimes. https://github.com/opencontainers/runtime-spec/blob/v1.1.0-rc.1/config-vm.md The high-level container engine may conditionally choose the VM disk image file, depending on the disk formats (raw, qcow2, etc.) and the firmwares (BIOS, UEFI) supported by the OCI runtime.

(The current PR does not cover the VM-based runtimes though. Can be covered in a separate PR).

Feb 01 '23 10:02 AkihiroSuda

This KEP may also depend on the runc features command:

https://github.com/kubernetes/enhancements/pull/3858#discussion_r1100869332

Feb 09 '23 07:02 AkihiroSuda

@opencontainers/runtime-spec-maintainers

We have 3 LGTMs. Can we merge this? Also, let's release the next RC right after deciding to merge this in v1.1 or not.

Feb 28 '23 07:02 AkihiroSuda

In case this is blocked on my comments (:heart:): I still don't feel like I actually understand fully what the use case here is (and why they're worth potentially invoking runc features for every container startup), but clearly several other maintainers do and I don't feel strongly enough to block it. :bow:

Mar 07 '23 00:03 tianon

what the use case here

The runtime features are expected to be eventually propagated to Kubernetes node annotations (in Kube's own format) so that kube-scheduler can avoid using nodes that lack the features required by the Pods.

e.g., when a Pod is launched with recursivelyReadOnly: Required (proposed in https://github.com/kubernetes/enhancements/pull/3858), the scheduler should avoid using nodes that lacks support for rro mount types.

why they're worth potentially invoking runc features for every container startup

No need to invoke runc features for every container startup. High-level engines can just cache it, but they may face an inconsistency issue when the cache is not invalidated on upgrading the runc binary.

Mar 07 '23 05:03 AkihiroSuda

runtime-spec runtime-spec copied to clipboard

Add `features.md` to formalize the `runc features` JSON

Differences from runc v1.1.0-rc.1

Differences from runc v1.1.0-rc.1

Differences from runc v1.1.0

runtime-spec
runtime-spec copied to clipboard