bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

containerd: set process_rlimit_no_file_soft/hard

Open samjo-nyang opened this issue 4 years ago • 12 comments

What I'd like: I'd like to set the maximum number of open file descriptors on containers dynamically, such as set

[plugins."io.containerd.grpc.v1.cri"]
process_rlimit_no_file_soft = 1234
process_rlimit_no_file_hard = 12345

on https://github.com/bottlerocket-os/bottlerocket/blob/develop/packages/containerd/containerd-config-toml_k8s

(from https://github.com/bottlerocket-os/bottlerocket/blob/develop/packages/containerd/1002-cri-set-default-RLIMIT_NOFILE.patch)

Any alternatives you've considered: (noting)

samjo-nyang avatar Aug 09 '21 03:08 samjo-nyang

Hi @samjo-nyang, thanks for opening this issue. Can you tell us a bit more about the challenges (if any) you are experiencing with the hard-coded rlimit values? More information would help us to prioritize this feature request.

samuelkarp avatar Aug 10 '21 17:08 samuelkarp

  1. ProcessRLimitNoFileSoft: 65536 is not enough for some microservices.
  2. I have an unknown issue for akka clusters on the bottlerocket nodes. I contacted to the bottlerocket team, and I've got the following answer: Akka uses select() so potentially it's failing because of Bottlerocket's higher default soft limit for open file descriptors. Can the customer try building a custom bottlerocket variant with "65536" here set to "1024" / Of course, I can build the custom image, but I think that it is convenient if I have this feature.

samjo-nyang avatar Aug 11 '21 06:08 samjo-nyang

hi @samuelkarp even we are facing same issues with our microservices. It would be nice if we can push it further to 1048576 as mentioned here https://github.com/bottlerocket-os/bottlerocket/blob/084fc3b30ea7662029976ce8205fbadd6dadd91f/packages/containerd/1001-cri-set-default-RLIMIT_NOFILE.patch#L46. But couldn't find any doc on how to put this.

rverma-dev avatar Sep 10 '21 01:09 rverma-dev

Thanks! We'll take look at if / how we can make this more configurable.

zmrow avatar Sep 10 '21 17:09 zmrow

There's an open Kubernetes issue requesting the ability to make these limits configurable. At first blush, making the limits configurable via Kubernetes (per pod or something) may be a bit more desirable than setting the limits for the entire host.

zmrow avatar Sep 13 '21 15:09 zmrow

Stuck between a rock and a hard place, I come here looking for help. I've got a third-party distoless container which is in need of adjusted ulimits. While it would be great to see Kubernetes actually take this on, it doesn't seem like it has any real traction at this time, and even if it did, I'd expect it to take over a year to go from code commit to actually being something that would be actually available on EKS: beta stage of implementation, enabled by default, version available on EKS, and us being able to upgrade to that version. While I can ask for various third party binaries to include the needed calls to adjust the soft rlimit, it is likely an infinite road to travel.

Would it be possible to consider adding this configuration option to Bottlerocket? Would it be predicated on the patch/config making its way upstream? Would a patch to add the config options be considered?

nairb774 avatar Apr 21 '22 18:04 nairb774

I agree that it would be nice to make these configurable in Bottlerocket. I came across this issue today when exploring what would be involved in dropping CAP_NET_RAW from the default capabilities list.

As with rlimits, we'd need the ability to modify the base OCI spec in order to avoid just hard-coding a new set of defaults that won't work for all use cases.

Maybe an API like this:

[settings.oci-defaults]
capabilities = [
  "chown",
  "dac-override",
  ...,
]

[settings.oci-defaults.resource-limits.max-open-files]
soft-limit = 1024
hard-limit = 65536

(edit: switched container-defaults to oci-defaults since that aligns better with the existing settings.oci-hooks, and mechanically reflects what we'd be doing - adjusting the default OCI spec.)

bcressey avatar May 01 '22 22:05 bcressey

I've got a third-party distoless container which is in need of adjusted ulimits.

@nairb774 if I understand correctly, the application doesn't have the logic to raise its own limit, so we'd have to make the soft limit configurable, and couldn't simply increase the hard limit again.

bcressey avatar May 02 '22 20:05 bcressey

the application doesn't have the logic to raise its own limit, so we'd have to make the soft limit configurable, and couldn't simply increase the hard limit again.

Correct.

The specific third-party software where I ran into file-descriptor exhaustion was with Istio's Gateways, and with KNative's Activators. I could be wrong, but there didn't seem to be support in Istio's proxy app, Envoy or in KNative's Activator to tweaks the ulimit/rlimits as part of their startup. I'd almost expect Envoy to have something, but I can't seem to quickly find something there in the docs. The specific app that was the source of my comment was a KNative Activator hitting the file descriptor limits. It might still be worth opening tickets with those projects and I'll try to get around to that. Unfortunately, they are just the apps that are causing problems today. Having reasonable levers available to help recover systems when they are having trouble is much appreciated.

Thanks for considering the request!

nairb774 avatar May 03 '22 00:05 nairb774

As with rlimits, we'd need the ability to modify the base OCI spec in order to avoid just hard-coding a new set of defaults that won't work for all use cases.

If we want to support this, I believe we need to change how shimpei works and extends its functionality so that it updates the spec that is given by containerd. I don't think we can use a prestart hook for this since at the point when the hook is running, it is too late since the configurations for the container were already applied at that point.

I think we should extend shimpei even more, and have it inject the prestart and poststart hooks as well, since otherwise we will have two binaries modifying the spec in disk (oci-add-hooks will be the other binary).

What do you think @bcressey ?

arnaldo2792 avatar Jul 16 '22 00:07 arnaldo2792

@mchaker is already working on allowing some overrides in the default OCI spec used by containerd :tada: !

I agree that it would be nice to make these configurable in Bottlerocket. I came across this issue today when exploring what would be involved in dropping CAP_NET_RAW from the default capabilities list.

@bcressey, I think there is already a way drop capabilities, and add them as required through kubernetes configurations:

securityContext:
      capabilities:
        drop:
          - all
        add: ["NET_ADMIN", "SYS_ADMIN"]

Doesn't this fit the use case that you have in mind? Or do we want Bottlerocket to be more opinionated on what default capabilities a container should be using that are considered "safe", and thus we need a way to express that opinion?

arnaldo2792 avatar Sep 07 '22 23:09 arnaldo2792

Yes, in particular I'd like a way to drop CAP_NET_RAW by default, without requiring every pod spec to do that.

bcressey avatar Sep 08 '22 14:09 bcressey

This feature is underway and now targeting the 1.11.0 release.

mchaker avatar Sep 30 '22 15:09 mchaker

The implementation and testing of this feature is taking longer than initially anticipated. Unfortunately it is not going to make it into 1.11.

Apologies for the delay and thank you for your understanding.

mchaker avatar Nov 09 '22 23:11 mchaker