Use UnmarshalTo instead of UnmarshalAny when dealing with cgroup metrics.
After encountering the issue here: https://github.com/containerd/containerd/issues/9052, it appears that the type assertion was failing as the typeof data was an interface{} despite the concrete type being a pointer to v1.Metrics. We can instead unmarshal the data directly into the expected type variable which appears to work as expected.
Hi @alam0rt. Thanks for your PR.
I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The unit test below didn't result in any error to me (with v1.7.5). Do you mind running it on your end?
func TestMetrics(t *testing.T) {
m := &v1.Metrics{
CPU: &stats.CPUStat{
Usage: &stats.CPUUsage{
Total: 50,
},
},
}
stats, err := typeurl.MarshalAny(m)
assert.NoError(t, err)
data, err := typeurl.UnmarshalAny(stats)
assert.NoError(t, err)
_, ok := data.(*v1.Metrics)
assert.True(t, ok)
dst := &v1.Metrics{}
err = typeurl.UnmarshalTo(stats, dst)
assert.NoError(t, err)
}
The unit test below didn't result in any error to me (with v1.7.5). Do you mind running it on your end?
func TestMetrics(t *testing.T) { m := &v1.Metrics{ CPU: &stats.CPUStat{ Usage: &stats.CPUUsage{ Total: 50, }, }, } stats, err := typeurl.MarshalAny(m) assert.NoError(t, err) data, err := typeurl.UnmarshalAny(stats) assert.NoError(t, err) _, ok := data.(*v1.Metrics) assert.True(t, ok) dst := &v1.Metrics{} err = typeurl.UnmarshalTo(stats, dst) assert.NoError(t, err) }
Damn, won't compile on OSX due to build flags.
I tried using ctr metrics
Oct 10 00:08:24 containerd[1448]: {"error":null,"level":"error","msg":"invalid metric type for 535655fc1c59d67a1c3c0e6a0f81f1636b6758485b80e162c730be9f86de0360","time":"2023-10-10T00:08:24.769770849Z"}
ID TIMESTAMP
535655fc1c59d67a1c3c0e6a0f81f1636b6758485b80e162c730be9f86de0360 seconds:1696896532 nanos:163248295
METRIC VALUE
memory.usage_in_bytes 30474240
memory.limit_in_bytes 268435456
memory.stat.cache 2060288
cpuacct.usage 2204998916
cpuacct.usage_percpu [144178557 89870576 31787241 48788881 365999279 34713446 43435968 539148328 53590245 112969897 453806691 44179111 145900230 33398840 31158931 32072695]
pids.current 24
pids.limit 0
No issue there. Will continue looking
Some more info: I believe this started occuring after the upgrade to 1.7.0 which I can see is when gogoproto was removed?
I added a print to https://github.com/containerd/containerd/blob/fe457eb99ac0e27b3ce638175ef8e68a7d2bc373/vendor/github.com/containerd/typeurl/v2/types.go https://github.com/zendesk/containerd/blob/fe457eb99ac0e27b3ce638175ef8e68a7d2bc373/vendor/github.com/containerd/typeurl/v2/types.go and can see that the typeURL Is gogoproto
case gogoproto.Message:
err = gogoproto.Unmarshal(value, t)
fmt.Printf("(typeURL: %v), gogoproto: %s\n", typeURL, t.String())
}
containerd[58435]: (typeURL: io.containerd.cgroups.v1.Metrics), gogoproto: &Metrics{H....
@alam0rt What build of containerd are you using? I was wondering how the types could conflict since a panic will occur with multiple registrations. However, typeurl, gogoproto, and protobuf packages each have their own global registrations. I'm wondering why your build has this type registered while others don't seem to.
@alam0rt What build of containerd are you using? I was wondering how the types could conflict since a panic will occur with multiple registrations. However, typeurl, gogoproto, and protobuf packages each have their own global registrations. I'm wondering why your build has this type registered while others don't seem to.
Hey @dmcgowan, we aren't using anything to esoteric I don't think
# containerd github.com/containerd/containerd v1.7.5 https://github.com/containerd/containerd/commit/fe457eb99ac0e27b3ce638175ef8e68a7d2bc373
$ runc --version
runc version 1.1.8
commit: v1.1.8-0-g82f18fe0
spec: 1.0.2-dev
go: go1.20.3
libseccomp: 2.5.4
Any guidance on this would be great - not sure why we are seeing this error considering the code seems fine + we are running a proper release
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Hi @alam0rt, @dmcgowan, @kiashok, @henry118 I would like to flag this patch as being important, I am currently seeing issues with stats when using Containerd 1.7.x with default runC (and gVisor), however when switching back to any of the 1.6.x releases this problem is resolved. I have also checked the stats returned from the various shims and they line up with the expected fields required by Containerd.
I also see that ctr currently uses the suggested method by @alam0rt to parse the metrics results from the shim API (see here), so it would be a good idea to keep the implementations similar anyway?
This PR is stale because it has been open 90 days with no activity. This PR will be closed in 7 days unless new comments are made or the stale label is removed.
This PR was closed because it has been stalled for 7 days with no activity.
I would suggest reopening this pr as the issue still exists and this pr fixes it.
Unmarshal io.containerd.cgroups.v1.Metrics with UnmarshalAny in 1.7+ (don't track which version it started with, but the latest version still has this issue) will first be parsed as github.com/containerd/cgroups/stats/v1.Metrics.
If UnmarshalAny in a unit test it shouldn't have this problem because it's not importing the wrong Metrics
This is due to hcsshim causing the dependency to have both github.com/containerd/cgroups/stats/v1/ and . /vendor/github.com/containerd/cgroups/v3/cgroup1/stats
How to reproduce this issue: env:
- containerd release/1.7 or 1.7.19
- cgroup v1
steps:
- configure /etc/containerd/config.toml to enable metrics
[metrics]
address = "127.0.0.1:7000"
- curl 127.0.0.1:7000/v1/metrics | grep ^container_
- You will notice that there is no container_* indicator.
- Check the logs of containerd, error log ->
{"error":null,"level":"error","msg":"invalid metric type for ...
This pr should rebase though, and it's not the issue in 2.0 because there aren't two v1/stat.Metrics