beats icon indicating copy to clipboard operation
beats copied to clipboard

[metricbeat/openmetrics, metricbeat/prometheus] histogram values of zero are filtered out on non-amd64 platforms

Open jonathan-albrecht-ibm opened this issue 6 months ago • 10 comments

Proposed commit message

[metricbeat/module/prometheus] [metricbeat/module/openmetrics] Keep histogram CumulativeCount and SampleCount and summary SampleCount as float64 internally to avoid undefined behavior when checking for NaN and Inf.

For openmetrics and prometheus histograms, some values like Bucket.CumulativeCount are stored as int64 after being parsed as float64 from the prometheus text parser. Later the values are checked to see if they are NaN or Inf but that is undefined behavior in go (https://github.com/golang/go/issues/67756#issuecomment-2142850931).

  • Internally store histogram CumulativeCount and SampleCount and summary SampleCount as float64
  • Only check float64 type values are checked for NaN and Inf
  • Cast float64 values to int64 to keep externally visible types the same
  • Add new tests with NaN and Inf values
  • Update existing test to cover case where 0 values were filtered out on non-x86_64 platforms

The underlying parser is github.com/prometheus/prometheus/model/textparse which returns values as float64 so there is no loss of precision due to this PR. The values returned by textparse are just kept as float64 internally so they can be correctly checked for NaN and Inf. The final conversion to unit64 is still done so there shouldn't be any user visible change other that fixing the bug.

Checklist

  • [x ] My code follows the style guidelines of this project
  • [ ] ~~I have commented my code, particularly in hard-to-understand areas~~
  • [ ] ~~I have made corresponding changes to the documentation~~
  • [ ] ~~I have made corresponding change to the default configuration files~~
  • [ x] I have added tests that prove my fix is effective or that my feature works
  • [x ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

None

Author's Checklist

  • [ ]

How to test this PR locally

Related tests should pass on darwin/arm64

Related issues

  • Closes #34235

Use cases

Screenshots

Logs

jonathan-albrecht-ibm avatar Jun 11 '25 14:06 jonathan-albrecht-ibm

:robot: GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

github-actions[bot] avatar Jun 11 '25 14:06 github-actions[bot]

This pull request does not have a backport label. If this is a bug or security fix, could you label this PR @jonathan-albrecht-ibm? 🙏. For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

mergify[bot] avatar Jun 11 '25 14:06 mergify[bot]

@shmsr, This PR should fix the bug you mentioned in https://github.com/elastic/beats/pull/40411/files#r1740184901. My testing was done on the s390x platform but from what I have been able to find s390x and arm64 have the same behavior wrt NaN and Inf. Would be great if you could have a look at this PR as well.

jonathan-albrecht-ibm avatar Jun 11 '25 14:06 jonathan-albrecht-ibm

Thanks for the PR @jonathan-albrecht-ibm !!!

belimawr avatar Jun 12 '25 15:06 belimawr

This pull request is now in conflicts. Could you fix it? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b nan-cast-pr upstream/nan-cast-pr
git merge upstream/main
git push upstream nan-cast-pr

mergify[bot] avatar Jun 12 '25 16:06 mergify[bot]

@shmsr, This PR should fix the bug you mentioned in https://github.com/elastic/beats/pull/40411/files#r1740184901. My testing was done on the s390x platform but from what I have been able to find s390x and arm64 have the same behavior wrt NaN and Inf. Would be great if you could have a look at this PR as well.

Yep, I know. When I encountered this issue and debugged it, I came across your issue; I have even linked it there. Thanks! :)

shmsr avatar Jun 12 '25 18:06 shmsr

/test

ishleenk17 avatar Jun 13 '25 20:06 ishleenk17

@ishleenk17 @belimawr @shmsr @gizas There were some go lint errors in the previous test run. I fixed the one where the uint64p function was not used any more but the three other ones are unrelated to my changes. They are all Error return value is not checked (errcheck) and its not obvious to me how to fix them.

Do I need to fix them in this PR? Could they be skipped so this PR can be merged?

jonathan-albrecht-ibm avatar Jun 16 '25 14:06 jonathan-albrecht-ibm

/test

ishleenk17 avatar Jun 16 '25 19:06 ishleenk17

@ishleenk17 @belimawr @shmsr @gizas There were some go lint errors in the previous test run. I fixed the one where the uint64p function was not used any more but the three other ones are unrelated to my changes. They are all Error return value is not checked (errcheck) and its not obvious to me how to fix them.

Do I need to fix them in this PR? Could they be skipped so this PR can be merged?

Not required. The CI has passed now. Waiting for review from @elastic/elastic-agent-data-plane team

ishleenk17 avatar Jun 17 '25 06:06 ishleenk17

@ishleenk17 CI is green and the PR approved by all teams, if you agree, I'll go ahead and merge it.

belimawr avatar Jun 25 '25 13:06 belimawr

@ishleenk17 CI is green and the PR approved by all teams, if you agree, I'll go ahead and merge it.

Hi @ishleenk17, just wanted to follow up to see if this PR can be merged.

jonathan-albrecht-ibm avatar Jul 03 '25 11:07 jonathan-albrecht-ibm

Thanks for reviewing and merging this PR everyone!

jonathan-albrecht-ibm avatar Jul 03 '25 11:07 jonathan-albrecht-ibm