[metricbeat/openmetrics, metricbeat/prometheus] histogram values of zero are filtered out on non-amd64 platforms
Proposed commit message
[metricbeat/module/prometheus] [metricbeat/module/openmetrics] Keep histogram CumulativeCount and SampleCount and summary SampleCount as float64 internally to avoid undefined behavior when checking for NaN and Inf.
For openmetrics and prometheus histograms, some values like Bucket.CumulativeCount are stored as int64 after being parsed as float64 from the prometheus text parser. Later the values are checked to see if they are NaN or Inf but that is undefined behavior in go (https://github.com/golang/go/issues/67756#issuecomment-2142850931).
- Internally store histogram CumulativeCount and SampleCount and summary SampleCount as float64
- Only check float64 type values are checked for NaN and Inf
- Cast float64 values to int64 to keep externally visible types the same
- Add new tests with NaN and Inf values
- Update existing test to cover case where 0 values were filtered out on non-x86_64 platforms
The underlying parser is github.com/prometheus/prometheus/model/textparse which returns values as float64 so there is no loss of precision due to this PR. The values returned by textparse are just kept as float64 internally so they can be correctly checked for NaN and Inf. The final conversion to unit64 is still done so there shouldn't be any user visible change other that fixing the bug.
Checklist
- [x ] My code follows the style guidelines of this project
- [ ] ~~I have commented my code, particularly in hard-to-understand areas~~
- [ ] ~~I have made corresponding changes to the documentation~~
- [ ] ~~I have made corresponding change to the default configuration files~~
- [ x] I have added tests that prove my fix is effective or that my feature works
- [x ] I have added an entry in
CHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.
Disruptive User Impact
None
Author's Checklist
- [ ]
How to test this PR locally
Related tests should pass on darwin/arm64
Related issues
- Closes #34235
Use cases
Screenshots
Logs
:robot: GitHub comments
Expand to view the GitHub comments
Just comment with:
rundocs-build: Re-trigger the docs validation. (use unformatted text in the comment!)
This pull request does not have a backport label. If this is a bug or security fix, could you label this PR @jonathan-albrecht-ibm? 🙏. For such, you'll need to label your PR with:
- The upcoming major version of the Elastic Stack
- The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)
To fixup this pull request, you need to add the backport labels for the needed branches, such as:
backport-8./dis the label to automatically backport to the8./dbranch./dis the digitbackport-active-allis the label that automatically backports to all active branches.backport-active-8is the label that automatically backports to all active minor branches for the 8 major.backport-active-9is the label that automatically backports to all active minor branches for the 9 major.
@shmsr, This PR should fix the bug you mentioned in https://github.com/elastic/beats/pull/40411/files#r1740184901. My testing was done on the s390x platform but from what I have been able to find s390x and arm64 have the same behavior wrt NaN and Inf. Would be great if you could have a look at this PR as well.
Thanks for the PR @jonathan-albrecht-ibm !!!
This pull request is now in conflicts. Could you fix it? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/
git fetch upstream
git checkout -b nan-cast-pr upstream/nan-cast-pr
git merge upstream/main
git push upstream nan-cast-pr
@shmsr, This PR should fix the bug you mentioned in https://github.com/elastic/beats/pull/40411/files#r1740184901. My testing was done on the s390x platform but from what I have been able to find s390x and arm64 have the same behavior wrt NaN and Inf. Would be great if you could have a look at this PR as well.
Yep, I know. When I encountered this issue and debugged it, I came across your issue; I have even linked it there. Thanks! :)
/test
@ishleenk17 @belimawr @shmsr @gizas There were some go lint errors in the previous test run. I fixed the one where the uint64p function was not used any more but the three other ones are unrelated to my changes. They are all Error return value is not checked (errcheck) and its not obvious to me how to fix them.
Do I need to fix them in this PR? Could they be skipped so this PR can be merged?
/test
@ishleenk17 @belimawr @shmsr @gizas There were some go lint errors in the previous test run. I fixed the one where the
uint64pfunction was not used any more but the three other ones are unrelated to my changes. They are allError return value is not checked (errcheck)and its not obvious to me how to fix them.Do I need to fix them in this PR? Could they be skipped so this PR can be merged?
Not required. The CI has passed now. Waiting for review from @elastic/elastic-agent-data-plane team
@ishleenk17 CI is green and the PR approved by all teams, if you agree, I'll go ahead and merge it.
@ishleenk17 CI is green and the PR approved by all teams, if you agree, I'll go ahead and merge it.
Hi @ishleenk17, just wanted to follow up to see if this PR can be merged.
Thanks for reviewing and merging this PR everyone!