runtime-spec icon indicating copy to clipboard operation
runtime-spec copied to clipboard

Add `features.md` to formalize the `runc features` JSON

Open AkihiroSuda opened this issue 3 years ago • 11 comments

Add features.md and features-linux.md, to formalize the runc features JSON that was introduced in runc v1.1.0-rc1.

A runtime caller MAY use this JSON to detect the features implemented by the runtime.

The spec corresponds to https://github.com/opencontainers/runc/blob/v1.1.0-rc.1/types/features/features.go (opencontainers/runc#3296, opencontainers/runc#3310)

Differences from runc v1.1.0-rc.1

  • Add .linux.intelRdt.enabled field
  • Add .linux.cgroup.rdma field

AkihiroSuda avatar Dec 09 '21 09:12 AkihiroSuda

Updated PR.

Differences from runc v1.1.0-rc.1

  • Add .linux.intelRdt.enabled field
  • Add .linux.cgroup.rdma field

AkihiroSuda avatar Dec 16 '21 11:12 AkihiroSuda

@giuseppe Could you take a look from crun's perspective?

AkihiroSuda avatar Jan 12 '22 05:01 AkihiroSuda

@giuseppe Could you take a look from crun's perspective?

from crun's perspective, it looks good.

What could also be useful for us, would have a way to know what log-driver is supported by a runtime (the --log-format option used by runc and crun), but that is not part of the OCI specs, so I am not sure if it could be added here?

giuseppe avatar Jan 12 '22 08:01 giuseppe

What could also be useful for us, would have a way to know what log-driver is supported by a runtime (the --log-format option used by runc and crun), but that is not part of the OCI specs, so I am not sure if it could be added here?

That should be probably a custom Annotations field like "org.opencontainers.runc.log-formats":"[\"text\",\"json\"]".

AkihiroSuda avatar Jan 12 '22 08:01 AkihiroSuda

That should be probably a custom Annotations field like "org.opencontainers.runc.log-formats":"[\"text\",\"json\"]".

the problem with custom annotations is that it won't really help to understand if a generic OCI runtime $FOO has support for --log-driver and what options are available because you need to know the field in advance.

giuseppe avatar Jan 12 '22 10:01 giuseppe

the problem with custom annotations is that it won't really help to understand if a generic OCI runtime $FOO has support for --log-driver and what options are available because you need to know the field in advance.

or are you suggesting that other OCI runtimes would use the same "org.opencontainers.runc.log-formats" key?

giuseppe avatar Jan 12 '22 10:01 giuseppe

@giuseppe

The runc CLI flags like --log-format are currently out of the scope of the OCI Specs, so they have to be defined as annotations under org.opencontainers.runc.* namespace.

Only runtimes with runc-compatible CLI should use these org.opencontainers.runc.* annotations.

We can add proper fields for log formats when the CLI behaviors get defined in the OCI Specs.

AkihiroSuda avatar Jan 12 '22 10:01 AkihiroSuda

This can potentially eliminate podman's containers.conf(5), parameters runtime_supports_*. Currently they have

# List of the OCI runtimes that support --format=json.  When json is supported
# engine will use it for reporting nicer errors.
#
#runtime_supports_json = ["crun", "runc", "kata", "runsc", "krun"]

# List of the OCI runtimes that supports running containers with KVM Separation.
#
#runtime_supports_kvm = ["kata", "krun"]

# List of the OCI runtimes that supports running containers without cgroups.
#
#runtime_supports_nocgroups = ["crun", "krun"]

As pointed out by @giuseppe above, runc (and crun) can add an annotation telling that --log-format is supported.

Something similar could be done with other properties from the above config.

kolyshkin avatar Jan 28 '22 22:01 kolyshkin

@opencontainers/runtime-spec-maintainers

Could you take a look? Is there any remaining task?

AkihiroSuda avatar Apr 01 '22 07:04 AkihiroSuda

@opencontainers/runtime-spec-maintainers Can this be merged? 🙏

AkihiroSuda avatar May 09 '22 22:05 AkihiroSuda

@kolyshkin @thaJeztah Could you take a look? This has been already implemented in runc since v1.1.0-rc1.

AkihiroSuda avatar Aug 16 '22 07:08 AkihiroSuda

Updated the PR

Differences from runc v1.1.0

  • Add .linux.intelRdt.enabled field
  • Add .linux.cgroup.rdma field
  • 🆕 Add .linux.seccomp.knownFlags and .linux.seccomp.supportedFlags fields (Implemented in opencontainers/runc#3588)

cc @kolyshkin for the seccomp changes

AkihiroSuda avatar Jan 25 '23 14:01 AkihiroSuda

Let's merge this after releasing v1.1.0-rc.1: https://github.com/opencontainers/runtime-spec/pull/1175

I think this PR can be included in v1.1.0-rc.2, but if controversial it can be postponed to v1.2.0.

AkihiroSuda avatar Jan 26 '23 01:01 AkihiroSuda

Rebased.

AkihiroSuda avatar Feb 01 '23 01:02 AkihiroSuda

Do you have an example of how you'd expect to see this used by a higher-level runtime like containerd or moby? Should they invoke this on every container creation? Once at their daemon startup and cache it for the lifetime? Periodically refresh their cache?

tianon avatar Feb 01 '23 05:02 tianon

Do you have an example of how you'd expect to see this used by a higher-level runtime like containerd or moby? Should they invoke this on every container creation? Once at their daemon startup and cache it for the lifetime? Periodically refresh their cache?

I'd suggest invoking this on every container creation, but the implementation can opt-in to cache it too.

AkihiroSuda avatar Feb 01 '23 06:02 AkihiroSuda

Isn't that a bit heavy just for a slightly better/earlier error message? (https://github.com/opencontainers/runtime-spec/pull/1130#discussion_r770457507)

Wouldn't it be better to update the requirements of something like Moby to a higher version of runc that does support rro and then just implement them, especially since that's a potential security issue? I'm sorry, but I still feel like I'm missing something here. :see_no_evil:

tianon avatar Feb 01 '23 06:02 tianon

Isn't that a bit heavy just for a slightly better/earlier error message? (#1130 (comment))

"Check the error message and disable the unsupported feature" would work fine for just testing a single feature, but doesn't scale for checking multiple features.

Also, how do we propagate the machine-readable error message to the caller? Not sure we can just jsonify the stderr.

Wouldn't it be better to update the requirements of something like Moby to a higher version of runc that does support rro and then just implement them, especially since that's a potential security issue? I'm sorry, but I still feel like I'm missing something here. 🙈

Checking the version number is fine if the runtime caller only supports runc, but doesn't work well for supporting crun, youki, etc.

AkihiroSuda avatar Feb 01 '23 06:02 AkihiroSuda

Wouldn't most of these things be things that the user asked for explicitly, so erroring all the way up is the appropriate response? Do you have an example of a workflow where something like Moby or containerd would want to enable a feature, check this API, and then dynamically no longer want to enable that feature? Doesn't that create a bit of an unexpected/inconsistent user experience?

tianon avatar Feb 01 '23 06:02 tianon

Wouldn't most of these things be things that the user asked for explicitly, so erroring all the way up is the appropriate response? Do you have an example of a workflow where something like Moby or containerd would want to enable a feature, check this API, and then dynamically no longer want to enable that feature? Doesn't that create a bit of an unexpected/inconsistent user experience?

Moby, containerd, etc. already have this sort of inconsistency; e.g., they conditionally enable Apparmor depending on its availability on the host. Same is highly likely to happen for other LSMs in future, such as Landlocks, when they are supported by the kernel and the OCI runtime.

Also, these high-level container engines may potentially have a feature to automatically translate ro mounts to rro to harden the "readonly" mounts, when it is supported by the kernel and the OCI runtime.

AkihiroSuda avatar Feb 01 '23 09:02 AkihiroSuda

The features should be also useful for calling VM-based OCI runtimes. https://github.com/opencontainers/runtime-spec/blob/v1.1.0-rc.1/config-vm.md The high-level container engine may conditionally choose the VM disk image file, depending on the disk formats (raw, qcow2, etc.) and the firmwares (BIOS, UEFI) supported by the OCI runtime.

(The current PR does not cover the VM-based runtimes though. Can be covered in a separate PR).

AkihiroSuda avatar Feb 01 '23 10:02 AkihiroSuda

This KEP may also depend on the runc features command:

  • https://github.com/kubernetes/enhancements/pull/3858#discussion_r1100869332

AkihiroSuda avatar Feb 09 '23 07:02 AkihiroSuda

@opencontainers/runtime-spec-maintainers

We have 3 LGTMs. Can we merge this? Also, let's release the next RC right after deciding to merge this in v1.1 or not.

AkihiroSuda avatar Feb 28 '23 07:02 AkihiroSuda

In case this is blocked on my comments (:heart:): I still don't feel like I actually understand fully what the use case here is (and why they're worth potentially invoking runc features for every container startup), but clearly several other maintainers do and I don't feel strongly enough to block it. :bow:

tianon avatar Mar 07 '23 00:03 tianon

what the use case here

The runtime features are expected to be eventually propagated to Kubernetes node annotations (in Kube's own format) so that kube-scheduler can avoid using nodes that lack the features required by the Pods.

e.g., when a Pod is launched with recursivelyReadOnly: Required (proposed in https://github.com/kubernetes/enhancements/pull/3858), the scheduler should avoid using nodes that lacks support for rro mount types.

why they're worth potentially invoking runc features for every container startup

No need to invoke runc features for every container startup. High-level engines can just cache it, but they may face an inconsistency issue when the cache is not invalidated on upgrading the runc binary.

AkihiroSuda avatar Mar 07 '23 05:03 AkihiroSuda