beats icon indicating copy to clipboard operation
beats copied to clipboard

[metricbeat/prometheus] Add panic recovery for Prometheus textparser

Open shmsr opened this issue 3 weeks ago • 3 comments

Proposed commit message

This PR hardens the Prometheus metrics parser against panics caused by malformed input data. The underlying Prometheus textparse library can panic on certain malformed inputs when calling parser.Labels() or parser.Exemplar(). These panics can crash Metricbeat when scraping endpoints that return unexpected data.

  • Panic recovery: Added safeLabels and safeExemplar wrapper functions that use defer recover() to catch panics from the Prometheus parser library
  • Nil pointer fix: Fixed MetricFamily.GetUnit() to check for nil Unit before dereferencing

Checklist

  • [x] My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [x] I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • [x] I have added an entry in ./changelog/fragments using the changelog tool.

Author's Checklist

  • Safeguard against panics when encountered unexpected data
  • Added TestParseMetricFamiliesMalformedInput with known crash-inducing inputs
  • Added fuzz test FuzzParseMetricFamilies to discover future crash inputs
  • Added unit tests for struct getter methods to increase coverage

shmsr avatar Dec 04 '25 10:12 shmsr

:robot: GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

github-actions[bot] avatar Dec 04 '25 10:12 github-actions[bot]

This pull request does not have a backport label. If this is a bug or security fix, could you label this PR @shmsr? 🙏. For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

mergify[bot] avatar Dec 04 '25 10:12 mergify[bot]

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine avatar Dec 04 '25 11:12 elasticmachine

@mergifyio backport 8.19 9.1 9.2 9.3

github-actions[bot] avatar Dec 17 '25 05:12 github-actions[bot]

backport 8.19 9.1 9.2 9.3

❌ No backport have been created

GitHub error: Branch not found

mergify[bot] avatar Dec 17 '25 05:12 mergify[bot]