TBS: Document discard_on_write_failure + expose it to the APM Integration
Potential follow up of https://github.com/elastic/apm-server/pull/15159
- Docs: document discard_on_write_failure
- Integration (APM): expose the setting via
- https://github.com/elastic/integrations/blob/a55e9eecb844d1d9192eb8584b81c687a213ef1b/packages/apm/agent/input/template.yml.hbs#L74
- https://github.com/elastic/integrations/blob/main/packages/apm/manifest.yml
Changes required
This task will be similar to the sampling.tail.ttl task in https://github.com/elastic/apm-server/issues/13525 in terms of the changes required
It was intentional to have sampling.tail.discard_on_write_failure undocumented, as it was supposed to be an escape hatch for users who are facing TBS storage limit issues while we look for a long term solution.
I understand the pain to configure this config (and many more configs not exposed in apm integration package) and I agree we should make it easier.
Regarding documentation, I'm not sure if the team is committed to support this config in the long run. cc @simitt
- APM Integrations PR: https://github.com/elastic/integrations/pull/13950
- [x] v8.19.0 Backport: https://github.com/elastic/integrations/pull/14029
- [x] Validate agent policy for 9.1.0-SNAPSHOT using local stack with
elastic-package stack up -d --version 9.1.0-SNAPSHOT - [x] Validate agent policy for 8.19.0
- Docs:
- [x] 9.1 docs PR: https://github.com/elastic/docs-content/pull/1453
- [x] 8.x docs PR: https://github.com/elastic/observability-docs/pull/4908
- APM Integration policy UI Issue: https://github.com/elastic/kibana/issues/221441
- [x] Perform E2E validation to confirm the config is correctly applied from the UI
- [x] 9.1.0 complete as part of release plan
- [x] 8.19.0
- [x] Perform E2E validation to confirm the config is correctly applied from the UI
- [x] Update any APM Server docs
I documented all the steps here to validate the APM integration changes per @rubvs recommendation. apm-integrations-validation-steps.md
@isaacaflores2 IMO https://github.com/elastic/apm-server/tree/main/dev_docs would be a good place for documenting how to test integration package changes for APM. You can reduce the discard_on_write_failure specifics, or keep them in as an example.
UI PR: https://github.com/elastic/kibana/pull/224479
edit: UI PR merged and e2e testing with the UI in progress (9.1.0 has been completed as part of the release test plan). 8.19.0 still pending.
We are still waiting for the release to complete so we can merge the doc PR
I think the page https://www.elastic.co/docs/solutions/observability/apm/tail-based-sampling should be updated with the existence and consequences of the sampling.tail.discard_on_write_failure (false -> ES will receive everything if TBS is full / false -> there will be data loss) as it will be soon available in APM Integration and was already available in APM Server.
Hey @lucabelluccini , we have a PR for the doc changes here: https://github.com/elastic/docs-content/pull/1453. We would like to avoid any confusion by updating the docs before the release, so we are waiting to merge
Thank you @isaacaflores2 !
The 9.1 Doc PR has been merged