vector icon indicating copy to clipboard operation
vector copied to clipboard

fix(datadog_agent source): Silence the "Source send cancelled." error

Open bruceg opened this issue 7 months ago • 3 comments

Summary

Eliminated the "Source send cancelled." error and corresponding metric for the datadog_agent source, as Datadog Agent will always resend events when the connection is dropped after a timeout.

Change Type

  • [x] Bug fix
  • [ ] New feature
  • [ ] Non-functional (chore, refactoring, docs)
  • [ ] Performance

Is this a breaking change?

  • [ ] Yes
  • [x] No

How did you test this PR?

Does this PR include user facing changes?

  • [x] Yes. Please add a changelog fragment based on our guidelines.
  • [ ] No. A maintainer will apply the "no-changelog" label to this PR.

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • The CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
      • ./scripts/check_changelog_fragments.sh
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please run cargo vdev build licenses to regenerate the license inventory and commit the changes (if any). More details here.

bruceg avatar May 23 '25 19:05 bruceg

Datadog Report

Branch report: bruceg/OPA-3143-datadog-agent Commit report: 216ae08 Test service: vector

:x: 1 Failed (1 Known Flaky), 7 Passed, 0 Skipped, 25.47s Total Time

:x: Failed Tests (1)

  • datadog::logs::validate - vector::e2e - :snowflake: Known flaky - Details

    Expand for error
    est has failed
    

This is reliably breaking the datadog-logs E2E test. It appears that the agent-vector configuration is not delivering logs to Vector, but I can't see why. I have reproduced it on my system, though trying the same test current master also fails, so I have no idea where to go on this.

bruceg avatar May 26 '25 23:05 bruceg

Hello,

I'm not sure if you're taking feedback on this PR, but my team uses Vector extensively and we find this error and associated metric very useful. "Source send cancelled" due to a timeout can mean that Vector is saturated and/or that a component is applying backpressure. "Source send cancelled" can be a symptom of various capacity issues including (but not limited to) saturated sink buffers, undersized compute, or even just general slowness on whatever's on the other side of the sink (e2e acknowledgements affect some but not all of these).

If possible, we'd like to keep these metrics and log events. They act as an early warning sign for us because even though the Datadog Agent will retransmit, an increase in "Source send cancelled" errors means that we are beginning to erode our safety margin.

byronwolfman avatar Jun 10 '25 13:06 byronwolfman

This conflicts too much with #24186 and #24183. I am closing it and will try another approach.

bruceg avatar Nov 12 '25 22:11 bruceg