beats icon indicating copy to clipboard operation
beats copied to clipboard

[beatreceiver] - Add status reporting

Open VihasMakwana opened this issue 5 months ago • 7 comments

Proposed commit message

This PR adds status reporting for beatreceivers. The status reporting is added while creating the runners. The first PR (https://github.com/elastic/beats/pull/44528) was quite "hacky" and it had go deep down to inject status reporters.

This PR adds a runner factory wrapper that will:

  1. Call the parent factory to create the runner
  2. Inject status reporter

The code responsible for doing the above tasks will live in libbeat and we will only enable it for beatreceivers. From an the beat receiver high level, it will do following:

  1. The beater will be created in createReceiver
  2. We will add the factory wrapper https://github.com/elastic/beats/blob/344bbcefae3c7be4f6f9a7ff0b1e7985caf0823c/x-pack/libbeat/cmd/instance/receiver.go#L80-L83
  3. The receiver will kick off the beater https://github.com/elastic/beats/blob/62864922d3227e4586ad0b53c9c0dfb213df3f69/x-pack/libbeat/cmd/instance/receiver.go#L76-L81

Note:

To accomplish the above steps, it is essential that we create the runners in beater.Run(...). Currently, metricbeat creates runners during the beater creation phase and starts them in beater.Run(...). This PR moves the runner creation code in beater.Run(...) to closely align with filebeat's implementation.

Checklist

  • [x] My code follows the style guidelines of this project
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

  • Related https://github.com/elastic/elastic-agent/issues/8210

Screenshots

Screenshot 2025-05-29 at 8 14 07 PM

Output

Here's output of running two streams (degraded) together:

┌─ fleet
│  └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
   ├─ status: (DEGRADED) 1 or more components/units in a degraded state
   ├─ pipeline:logs/_agent-component/filestream-default
   │  ├─ status: StatusRecoverableError [error while running harvester: cannot read from file source: /var/log/elasticAgent-install-20240625_133733.log]
   │  ├─ exporter:elasticsearch/_agent-component/default
   │  │  └─ status: StatusOK
   │  └─ receiver:filebeatreceiver/_agent-component/filestream-default
   │     └─ status: StatusRecoverableError [error while running harvester: cannot read from file source: /var/log/elasticAgent-install-20240625_133733.log]
   └─ pipeline:logs/_agent-component/system/metrics-default
      ├─ status: StatusRecoverableError [Error fetching data for metricset system.process: error fetching process list: non fatal error; reporting partial metrics: error fetching PID metrics for 607 processes, most likely a "permission denied" error. Enable debug logging to determine the exact cause.]
      ├─ exporter:elasticsearch/_agent-component/default
      │  └─ status: StatusOK
      └─ receiver:metricbeatreceiver/_agent-component/system/metrics-default
         └─ status: StatusRecoverableError [Error fetching data for metricset system.process: error fetching process list: non fatal error; reporting partial metrics: error fetching PID metrics for 607 processes, most likely a "permission denied" error. Enable debug logging to determine the exact cause.]

Testing

  1. Checkout this PR locally
  2. Go to elastic-agent and follow this guide to test local beats changes
  3. Package agent with mage package
  4. Follow steps on https://github.com/elastic/elastic-agent/issues/8210 to install agent and verify the status

Closes https://github.com/elastic/elastic-agent/issues/8210

VihasMakwana avatar Jun 12 '25 12:06 VihasMakwana

:robot: GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

github-actions[bot] avatar Jun 12 '25 12:06 github-actions[bot]

This pull request does not have a backport label. If this is a bug or security fix, could you label this PR @VihasMakwana? 🙏. For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

mergify[bot] avatar Jun 12 '25 12:06 mergify[bot]

Quite an elegant solution for this problem!

mauri870 avatar Jun 12 '25 13:06 mauri870

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine avatar Jun 17 '25 05:06 elasticmachine

This pull request is now in conflicts. Could you fix it? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b wrap-runner-factory upstream/wrap-runner-factory
git merge upstream/main
git push upstream wrap-runner-factory

mergify[bot] avatar Jun 17 '25 13:06 mergify[bot]

@VihasMakwana Could you please fix the conflicts? Thank you!

mauri870 avatar Jun 17 '25 15:06 mauri870

@mauri870 @khushijain21 I've added new test cases and have made changes to benchmark modules for testing. We can now make benchmark module return error if we want, to test status reporting. Please take a look!

VihasMakwana avatar Jun 20 '25 11:06 VihasMakwana