runc icon indicating copy to clipboard operation
runc copied to clipboard

Support fetching PSI stats for cgroupv2 containers

Open dqminh opened this issue 3 years ago • 12 comments

This supports fetching PSI stats for cgroupv2 containers. We read the PSI metrics if they are available from:

  • cpu.pressure
  • memory.pressure
  • io.pressure

See more about PSI at https://www.kernel.org/doc/html/latest/accounting/psi.html

dqminh avatar Jan 28 '22 17:01 dqminh

Can we please have s/Psi/PSI/ since this is an acronym? I know we use things like Json before, but in the new code we can do better.

kolyshkin avatar Jan 31 '22 19:01 kolyshkin

ping @opencontainers/runc-maintainers

dqminh avatar Feb 24 '22 11:02 dqminh

ping @opencontainers/runc-maintainers again

dqminh avatar Mar 22 '22 11:03 dqminh

@dqminh can you please rebase? We have changed something in the tests and they need to be re-run.

kolyshkin avatar Mar 23 '22 02:03 kolyshkin

@kolyshkin thanks, rebase is done. I had to add # shellcheck disable=SC2030 for the integration test.

dqminh avatar Mar 23 '22 11:03 dqminh

ping @kolyshkin, this would be great to get in, especially for use downstream in cAdvisor / k8s

bobbypage avatar Mar 29 '22 19:03 bobbypage

@kolyshkin updated the nits, also added a fix to check for ENOTSUP when kernel compiled with PSI_DEFAULT_DISABLED, so reading *.pressure file return ENOTSUP as seen in https://cirrus-ci.com/task/4929935776153600

dqminh avatar May 13 '22 16:05 dqminh

ping @kolyshkin for another round of reviews :)

bobbypage avatar Jun 07 '22 20:06 bobbypage

gentle ping to @dqminh regarding the updates mentioned. would be great to get this in so we can consume PSI metrics downstream in cAdvisor / k8s

bobbypage avatar Jul 01 '22 22:07 bobbypage

@kolyshkin Maybe we can address those issues after merging?

AkihiroSuda avatar Jul 29 '22 14:07 AkihiroSuda

Hey @dqminh this is a great PR. There has been no activity for some time so I am wondering if you are still interested? Otherwise I would be willing to take over and make the requested changes.

Furisto avatar Sep 08 '22 15:09 Furisto

@kolyshkin Given that this PR has stalled, how would you like to proceed? I can make the requested changes. Would be great if we could make PSI metrics available.

Furisto avatar Sep 19 '22 09:09 Furisto

@thaJeztah i'm really sorry about the delay. We have been running with the patch for so long (!) that i forgot it hasn't been merged.

I left a few comments in the PR on my fork. Generally i think we can try to make the refactoring of existing practices separately from the current change.

dqminh avatar Oct 11 '22 10:10 dqminh

@dqminh I read the comments and tried to fix everything and opened https://github.com/dqminh/runc/pull/37 . I have here issue I can't compile it and test wait for fetching containers since 15 minutes. Dirty, because I did not commit and pushed changes when I tried to make.

% make
go build -trimpath "-buildmode=pie"  -tags "seccomp" -ldflags "-X main.gitCommit=v1.1.0-181-g6414629a-dirty -X main.version=1.1.0+dev " -o runc .
# pkg-config --cflags  -- libseccomp
Package libseccomp was not found in the pkg-config search path.
Perhaps you should add the directory containing `libseccomp.pc'
to the PKG_CONFIG_PATH environment variable
Package 'libseccomp', required by 'virtual:world', not found
pkg-config: exit status 1
make: *** [Makefile:33: runc] Error 2
zsh: exit 2     make

Tests run and fail for some reason, that might be problem of my system, not the code, but let's see what the test suite says.

I hope this helps to get it merged. :)

szuecs avatar Nov 04 '22 21:11 szuecs

Stray comment: As of linux 6.1 there's an interface called cgroup.pressure to switch PSI accounting on/off per cgroup. I suppose the code here will have to check if accounting is turned off.

h-vetinari avatar Nov 07 '22 01:11 h-vetinari

FYI: I created https://github.com/opencontainers/runc/pull/3679 as follow up to get this merged

szuecs avatar Dec 06 '22 16:12 szuecs