clean cached rlimit nofile in go runtime
fix:#4195 In a relatively new runc version, in other words a new go version (in my environment the go version is go1.20.13), the runtime will cache the value of rlimit nofile, which will cause the container's rlimit nofile configuration to not take effect. This problem must occur when the customer configures the CAP_SYS_RESOURCE permission.
refer to:[commit](// https://github.com/golang/go/commit/f5eef58e4381259cbd84b3f2074c79607fb5c821#diff-ec665e9789f8cf5cd1828ad7fa9f0ff4ebc1f5b5dd0fc82a296da5c07da7ece6)
go runtime caches rlimit-nofile during startup phase
and before exec,runtime will reset rlimit-nofile.
@ls-ggg can you do a rebase instead of a merge? I.e. do not add merge commits.
Also, can we have a test case?
@ls-ggg can you do a rebase instead of a merge? I.e. do not add merge commits.
Also, can we have a test case?
@kolyshkin I've done the rebase and added test cases.
OK, I took a deeper look and in fact this is already fixed in the main branch via #3813, which brings in this change: https://go-review.googlesource.com/c/sys/+/476695
So, to my best understanding, this PR is not needed.
For release-1.1 branch to fix this, we need something like #3813.
For release-1.1 branch to fix this, we need something like #3813.
Opened #4239; PTAL
OK, I took a deeper look and in fact this is already fixed in the main branch via #3813, which brings in this change: https://go-review.googlesource.com/c/sys/+/476695
So, to my best understanding, this PR is not needed.
Closing (not needed in main branch, opened #4239 for release-1.1 branch).
Interestingly, runc-1.1.12 (as used in the original bug report, #4195) is compiled with Go 1.20.13:
$ ./runc-1.1.12 --version
runc version 1.1.12
commit: v1.1.12-0-g51d5e946
spec: 1.0.2-dev
go: go1.20.13
libseccomp: 2.5.4
and I was able to reproduce the issue with this version.
Found out that Go 1.20 has CL 476097 backported (as CL 478659), so Go 1.20 is also affected (since Go1.20.4).
Found out that Go 1.20 has CL 476097 backported (as CL 478659), so Go 1.20 is also affected (since Go 1.20.4).
As well as Go 1.19.+ (CL 478660).
For some reason, my alternative approach (#4239) is not working.
Testing this PR, I got this:
$ RUNC=~/git/runc/runc-4237-go1.21; i=0; $RUNC --version && while [ $i -lt 100000 ]; do LIM=$(sudo $RUNC exec bionic sh -c "ulimit -n"); if [ $LIM -ne 65536 ]; then echo "WHOOPSIE (iter $i) got numfile $LIM"; break; fi; ((i%100==0)) && printf "[%s] %10d\n" "$(date)" "$i"; let i++; done
runc version 1.1.0+dev
commit: v1.1.0-994-g4641f17e-dirty
spec: 1.2.0
go: go1.21.7
libseccomp: 2.5.1
[Tue 02 Apr 2024 10:10:36 PM PDT] 0
[Tue 02 Apr 2024 10:10:38 PM PDT] 100
...
[Tue 02 Apr 2024 10:23:23 PM PDT] 42300
WHOOPSIE (iter 42333) got numfile 1048576
So there is still some race going on :(
WHOOPSIE (iter 42333) got numfile 1048576So there is still some race going on :(
Note the number is now as go runtime sets it for itself.
Most importantly, this fix does not work :-\
OK, I took a deeper look and in fact this is already fixed in the main branch via #3813, which brings in this change: https://go-review.googlesource.com/c/sys/+/476695
So, to my best understanding, this PR is not needed.
For release-1.1 branch to fix this, we need something like #3813.
@kolyshkin
The #3813 cannot fix it. the correct way is to clear the cache in the runtime.
Can this mr be approved?
closing in favor of #4265, which incorporates this work.