apm-server icon indicating copy to clipboard operation
apm-server copied to clipboard

TBS: Document discard_on_write_failure + expose it to the APM Integration

Open lucabelluccini opened this issue 10 months ago • 4 comments

Potential follow up of https://github.com/elastic/apm-server/pull/15159

  • Docs: document discard_on_write_failure
  • Integration (APM): expose the setting via
    • https://github.com/elastic/integrations/blob/a55e9eecb844d1d9192eb8584b81c687a213ef1b/packages/apm/agent/input/template.yml.hbs#L74
    • https://github.com/elastic/integrations/blob/main/packages/apm/manifest.yml

Changes required

This task will be similar to the sampling.tail.ttl task in https://github.com/elastic/apm-server/issues/13525 in terms of the changes required

lucabelluccini avatar Jan 22 '25 14:01 lucabelluccini

It was intentional to have sampling.tail.discard_on_write_failure undocumented, as it was supposed to be an escape hatch for users who are facing TBS storage limit issues while we look for a long term solution.

I understand the pain to configure this config (and many more configs not exposed in apm integration package) and I agree we should make it easier.

Regarding documentation, I'm not sure if the team is committed to support this config in the long run. cc @simitt

carsonip avatar Jan 22 '25 15:01 carsonip

  • APM Integrations PR: https://github.com/elastic/integrations/pull/13950
    • [x] v8.19.0 Backport: https://github.com/elastic/integrations/pull/14029
    • [x] Validate agent policy for 9.1.0-SNAPSHOT using local stack with elastic-package stack up -d --version 9.1.0-SNAPSHOT
    • [x] Validate agent policy for 8.19.0
  • Docs:
    • [x] 9.1 docs PR: https://github.com/elastic/docs-content/pull/1453
    • [x] 8.x docs PR: https://github.com/elastic/observability-docs/pull/4908
  • APM Integration policy UI Issue: https://github.com/elastic/kibana/issues/221441
    • [x] Perform E2E validation to confirm the config is correctly applied from the UI
      • [x] 9.1.0 complete as part of release plan
      • [x] 8.19.0
  • [x] Update any APM Server docs

isaacaflores2 avatar May 23 '25 21:05 isaacaflores2

I documented all the steps here to validate the APM integration changes per @rubvs recommendation. apm-integrations-validation-steps.md

isaacaflores2 avatar May 23 '25 22:05 isaacaflores2

@isaacaflores2 IMO https://github.com/elastic/apm-server/tree/main/dev_docs would be a good place for documenting how to test integration package changes for APM. You can reduce the discard_on_write_failure specifics, or keep them in as an example.

simitt avatar May 26 '25 10:05 simitt

UI PR: https://github.com/elastic/kibana/pull/224479

edit: UI PR merged and e2e testing with the UI in progress (9.1.0 has been completed as part of the release test plan). 8.19.0 still pending.

We are still waiting for the release to complete so we can merge the doc PR

isaacaflores2 avatar Jun 18 '25 23:06 isaacaflores2

I think the page https://www.elastic.co/docs/solutions/observability/apm/tail-based-sampling should be updated with the existence and consequences of the sampling.tail.discard_on_write_failure (false -> ES will receive everything if TBS is full / false -> there will be data loss) as it will be soon available in APM Integration and was already available in APM Server.

lucabelluccini avatar Jun 25 '25 11:06 lucabelluccini

Hey @lucabelluccini , we have a PR for the doc changes here: https://github.com/elastic/docs-content/pull/1453. We would like to avoid any confusion by updating the docs before the release, so we are waiting to merge

isaacaflores2 avatar Jun 25 '25 19:06 isaacaflores2

Thank you @isaacaflores2 !

lucabelluccini avatar Jun 26 '25 10:06 lucabelluccini

The 9.1 Doc PR has been merged

isaacaflores2 avatar Jul 28 '25 22:07 isaacaflores2