fix(datadog_agent source): Silence the "Source send cancelled." error
Summary
Eliminated the "Source send cancelled." error and corresponding metric for the datadog_agent source, as Datadog Agent will always resend events when the connection is dropped after a timeout.
Change Type
- [x] Bug fix
- [ ] New feature
- [ ] Non-functional (chore, refactoring, docs)
- [ ] Performance
Is this a breaking change?
- [ ] Yes
- [x] No
How did you test this PR?
Does this PR include user facing changes?
- [x] Yes. Please add a changelog fragment based on our guidelines.
- [ ] No. A maintainer will apply the "no-changelog" label to this PR.
Notes
- Please read our Vector contributor resources.
- Do not hesitate to use
@vectordotdev/vectorto reach out to us regarding this PR. - The CI checks run only after we manually approve them.
- We recommend adding a
pre-pushhook, please see this template. - Alternatively, we recommend running the following locally before pushing to the remote branch:
cargo fmt --allcargo clippy --workspace --all-targets -- -D warningscargo nextest run --workspace(alternatively, you can runcargo test --all)./scripts/check_changelog_fragments.sh
- We recommend adding a
- After a review is requested, please avoid force pushes to help us review incrementally.
- Feel free to push as many commits as you want. They will be squashed into one before merging.
- For example, you can run
git merge origin masterandgit push.
- If this PR introduces changes Vector dependencies (modifies
Cargo.lock), please runcargo vdev build licensesto regenerate the license inventory and commit the changes (if any). More details here.
Datadog Report
Branch report: bruceg/OPA-3143-datadog-agent
Commit report: 216ae08
Test service: vector
:x: 1 Failed (1 Known Flaky), 7 Passed, 0 Skipped, 25.47s Total Time
:x: Failed Tests (1)
-
datadog::logs::validate-vector::e2e- :snowflake: Known flaky - DetailsExpand for error
est has failed
This is reliably breaking the datadog-logs E2E test. It appears that the agent-vector configuration is not delivering logs to Vector, but I can't see why. I have reproduced it on my system, though trying the same test current master also fails, so I have no idea where to go on this.
Hello,
I'm not sure if you're taking feedback on this PR, but my team uses Vector extensively and we find this error and associated metric very useful. "Source send cancelled" due to a timeout can mean that Vector is saturated and/or that a component is applying backpressure. "Source send cancelled" can be a symptom of various capacity issues including (but not limited to) saturated sink buffers, undersized compute, or even just general slowness on whatever's on the other side of the sink (e2e acknowledgements affect some but not all of these).
If possible, we'd like to keep these metrics and log events. They act as an early warning sign for us because even though the Datadog Agent will retransmit, an increase in "Source send cancelled" errors means that we are beginning to erode our safety margin.
This conflicts too much with #24186 and #24183. I am closing it and will try another approach.