runc icon indicating copy to clipboard operation
runc copied to clipboard

clean cached rlimit nofile in go runtime

Open ls-ggg opened this issue 1 year ago • 12 comments

fix:#4195 In a relatively new runc version, in other words a new go version (in my environment the go version is go1.20.13), the runtime will cache the value of rlimit nofile, which will cause the container's rlimit nofile configuration to not take effect. This problem must occur when the customer configures the CAP_SYS_RESOURCE permission.

refer to:[commit](// https://github.com/golang/go/commit/f5eef58e4381259cbd84b3f2074c79607fb5c821#diff-ec665e9789f8cf5cd1828ad7fa9f0ff4ebc1f5b5dd0fc82a296da5c07da7ece6)

go runtime caches rlimit-nofile during startup phase image

and before exec,runtime will reset rlimit-nofile. image

ls-ggg avatar Mar 29 '24 10:03 ls-ggg

@ls-ggg can you do a rebase instead of a merge? I.e. do not add merge commits.

Also, can we have a test case?

kolyshkin avatar Apr 02 '24 00:04 kolyshkin

@ls-ggg can you do a rebase instead of a merge? I.e. do not add merge commits.

Also, can we have a test case?

@kolyshkin I've done the rebase and added test cases.

ls-ggg avatar Apr 02 '24 05:04 ls-ggg

OK, I took a deeper look and in fact this is already fixed in the main branch via #3813, which brings in this change: https://go-review.googlesource.com/c/sys/+/476695

So, to my best understanding, this PR is not needed.

For release-1.1 branch to fix this, we need something like #3813.

kolyshkin avatar Apr 02 '24 17:04 kolyshkin

For release-1.1 branch to fix this, we need something like #3813.

Opened #4239; PTAL

kolyshkin avatar Apr 02 '24 20:04 kolyshkin

OK, I took a deeper look and in fact this is already fixed in the main branch via #3813, which brings in this change: https://go-review.googlesource.com/c/sys/+/476695

So, to my best understanding, this PR is not needed.

Closing (not needed in main branch, opened #4239 for release-1.1 branch).

kolyshkin avatar Apr 02 '24 22:04 kolyshkin

Interestingly, runc-1.1.12 (as used in the original bug report, #4195) is compiled with Go 1.20.13:

$ ./runc-1.1.12 --version
runc version 1.1.12
commit: v1.1.12-0-g51d5e946
spec: 1.0.2-dev
go: go1.20.13
libseccomp: 2.5.4

and I was able to reproduce the issue with this version.

Found out that Go 1.20 has CL 476097 backported (as CL 478659), so Go 1.20 is also affected (since Go1.20.4).

kolyshkin avatar Apr 03 '24 01:04 kolyshkin

Found out that Go 1.20 has CL 476097 backported (as CL 478659), so Go 1.20 is also affected (since Go 1.20.4).

As well as Go 1.19.+ (CL 478660).

kolyshkin avatar Apr 03 '24 04:04 kolyshkin

For some reason, my alternative approach (#4239) is not working.

Testing this PR, I got this:

$ RUNC=~/git/runc/runc-4237-go1.21; i=0; $RUNC --version && while [ $i -lt 100000 ]; do LIM=$(sudo $RUNC exec bionic sh -c "ulimit -n"); if [ $LIM -ne 65536 ]; then echo "WHOOPSIE (iter $i) got numfile $LIM"; break; fi; ((i%100==0)) && printf "[%s] %10d\n" "$(date)" "$i"; let i++; done
runc version 1.1.0+dev
commit: v1.1.0-994-g4641f17e-dirty
spec: 1.2.0
go: go1.21.7
libseccomp: 2.5.1
[Tue 02 Apr 2024 10:10:36 PM PDT]          0
[Tue 02 Apr 2024 10:10:38 PM PDT]        100
...
[Tue 02 Apr 2024 10:23:23 PM PDT]      42300
WHOOPSIE (iter 42333) got numfile 1048576

So there is still some race going on :(

kolyshkin avatar Apr 03 '24 05:04 kolyshkin

WHOOPSIE (iter 42333) got numfile 1048576

So there is still some race going on :(

Note the number is now as go runtime sets it for itself.

kolyshkin avatar Apr 03 '24 05:04 kolyshkin

Most importantly, this fix does not work :-\

kolyshkin avatar Apr 05 '24 00:04 kolyshkin

OK, I took a deeper look and in fact this is already fixed in the main branch via #3813, which brings in this change: https://go-review.googlesource.com/c/sys/+/476695

So, to my best understanding, this PR is not needed.

For release-1.1 branch to fix this, we need something like #3813.

@kolyshkin
The #3813 cannot fix it. the correct way is to clear the cache in the runtime.

ls-ggg avatar Apr 12 '24 08:04 ls-ggg

Can this mr be approved?

ls-ggg avatar Apr 28 '24 07:04 ls-ggg

Can this mr be approved?

As I said a month ago, this fix does not work.

kolyshkin avatar Apr 29 '24 21:04 kolyshkin

closing in favor of #4265, which incorporates this work.

kolyshkin avatar May 08 '24 17:05 kolyshkin