istio icon indicating copy to clipboard operation
istio copied to clipboard

Make cluster_name stat tag regex not lazy

Open keithmattix opened this issue 1 year ago • 1 comments

Please provide a description of this PR:

Fixes #51761.

Envoy allows specifying stat tags by writing a regex to parse the default metrics it generates. Envoy removes the first capture group from the regex and then sets the tag value to the second capture group. The existing regex for cluster_name (that Istio has had for 6 years) had a lazy qualifier in its second capture group, causing the regex parser to stop at the first match. This means non-Kubernetes FQDNs (i.e. ending in svc.cluster.local) like api.facebook.com would have incorrect values for the cluster_name metric label. This PR fixes that. Adding DNM so I can add a test

keithmattix avatar Jun 28 '24 19:06 keithmattix

This looks like a huge change on metric lable which may break users' alert, should we introduce a flag to give a little time for migration?

zirain avatar Jul 02 '24 22:07 zirain

The key with this approach is that it doesn't effect any metrics that aren't already broken. If you have a cluster named api.facebook.com the cluster_name stat tag is just going to be api and the metric name is messed up. This should not affect any correct metrics

keithmattix avatar Jul 02 '24 22:07 keithmattix

@keithmattix with this, do we end up with the tag in the metrics containing the ;? Or is it dropped as part of the extraction

howardjohn avatar Jul 02 '24 22:07 howardjohn

@keithmattix with this, do we end up with the tag in the metrics containing the ;? Or is it dropped as part of the extraction

This PR is an alternate approach; I made 2 different PRs to compare

keithmattix avatar Jul 02 '24 23:07 keithmattix

@keithmattix with this, do we end up with the tag in the metrics containing the ;? Or is it dropped as part of the extraction

This PR is an alternate approach; I made 2 different PRs to compare

Not sure that answers my question. Does the ; end up in the final Prometheus metric or is it ignored?

howardjohn avatar Jul 02 '24 23:07 howardjohn

@keithmattix with this, do we end up with the tag in the metrics containing the ;? Or is it dropped as part of the extraction

This PR is an alternate approach; I made 2 different PRs to compare

Not sure that answers my question. Does the ; end up in the final Prometheus metric or is it ignored?

What I was trying to say is that there's no ; in this PR 🙂 this one does the underscore swap. But to answer your question your question, no the ; doesn't show up in the final metric name

keithmattix avatar Jul 02 '24 23:07 keithmattix

@keithmattix with this, do we end up with the tag in the metrics containing the ;? Or is it dropped as part of the extraction

This PR is an alternate approach; I made 2 different PRs to compare

Not sure that answers my question. Does the ; end up in the final Prometheus metric or is it ignored?

What I was trying to say is that there's no ; in this PR 🙂 this one does the underscore swap. But to answer your question your question, no the ; doesn't show up in the final metric name

oh sorry, mixed them up. The ; approach seems better then, is there case where this approach is?

howardjohn avatar Jul 02 '24 23:07 howardjohn

@keithmattix with this, do we end up with the tag in the metrics containing the ;? Or is it dropped as part of the extraction

This PR is an alternate approach; I made 2 different PRs to compare

Not sure that answers my question. Does the ; end up in the final Prometheus metric or is it ignored?

What I was trying to say is that there's no ; in this PR 🙂 this one does the underscore swap. But to answer your question your question, no the ; doesn't show up in the final metric name

oh sorry, mixed them up. The ; approach seems better then, is there case where this approach is?

IMO it's a question of risk tolerance. The cluster_name regex has been around for ~6 years and the ; PR changes it where this one doesn't. I think the ; approach is better as well, and as I worked on it I couldn't think of any edge cases. If there's a consensus on that approach, I'll close this one

keithmattix avatar Jul 03 '24 01:07 keithmattix

Closing in favor of #51861

keithmattix avatar Jul 03 '24 16:07 keithmattix