opentelemetry-go
opentelemetry-go copied to clipboard
Add warning log on dropped spans in batch span processor
This aims to add warning logging during batch span processor dropping spans.
https://github.com/open-telemetry/opentelemetry-go/issues/3264
One thing I'm unsure about is the logger doesn't offer a Warn method already, I added one here - however, unsure if we want to keep that change.
https://github.com/open-telemetry/opentelemetry-go/blob/main/internal/global/internal_logging.go#L45-L63
Maybe the info log should use a low priority number so that we have somewhere to fit warn into this?
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: prodion23 / name: Donnie (c97924f7d4ab74f3d908ea91741f88099788b1a2)
@Aneurysm9 thank you for the feedback here.
One thing to draw specific scrutiny towards, unfortunately we can't use the WithBlocking setting for this test because it never increments drop counts on that code path.
https://github.com/prodion23/opentelemetry-go/blob/0052a23c290951aeeae8c6adef5c6219d34d8616/sdk/trace/batch_span_processor.go#L362-L363
https://github.com/prodion23/opentelemetry-go/blob/0052a23c290951aeeae8c6adef5c6219d34d8616/sdk/trace/batch_span_processor.go#L362-L363
During the test, it consistently seems to flake if I run it with a small genNumSpans argument, essentially anything less than 10 would fail 50% of the time.
Setting it to 100 has consistently passed but understand if there is concern with using a known flappy configuration.
With genNumSpans set to 100 the log statement shows that between ~80 to ~90 spans are dropped.
Happy to revisit this test if you have suggestions.
I think using the non-blocking-on-full-queue mode for this test is probably appropriate, but would love to get the opinion of @open-telemetry/go-approvers. This component has a long history of flaky tests, but it looks like the test you have proposed should be fairly reliable.
Please ensure that all commits are associated with a user that has signed the CLA as we will be unable to proceed further (and some approvers may not review the PR) until the CLA is signed.
It looks like the added test is quite flaky or does not succeed on windows: https://github.com/open-telemetry/opentelemetry-go/actions/runs/3380083340/jobs/5613037464
We need our testing system to be solid and therefore this needs to be refactored or redesigned to ensure tests pass.
I updated the test to make it more robust. @MrAlias and @Aneurysm9 can you please review to see if this would be acceptable?
I didn't test in windows, but I found the old version would often fail if ran with go test -run ^TestBatchSpanProcessorLogsWarningOnNewDropSpans$ -count 50, but this version doesn't.
Codecov Report
Merging #3289 (9af553a) into main (291aaa0) will increase coverage by
0.2%. The diff coverage is100.0%.
Additional details and impacted files
@@ Coverage Diff @@
## main #3289 +/- ##
=======================================
+ Coverage 78.0% 78.2% +0.2%
=======================================
Files 165 165
Lines 11755 11762 +7
=======================================
+ Hits 9177 9207 +30
+ Misses 2382 2359 -23
Partials 196 196
| Impacted Files | Coverage Δ | |
|---|---|---|
| sdk/trace/batch_span_processor.go | 82.0% <100.0%> (+0.9%) |
:arrow_up: |
| sdk/resource/resource.go | 83.5% <0.0%> (+5.7%) |
:arrow_up: |
| sdk/trace/provider.go | 95.5% <0.0%> (+6.2%) |
:arrow_up: |
Looks like the tests are still failing: https://github.com/open-telemetry/opentelemetry-go/actions/runs/3586714280/jobs/6036245937
I'm more in favor of stable testing than accepting these changes as is.
I'm more in favor of stable testing than accepting these changes as is.
Me too, the problem here is to observe dropped spans we can't use the blocking BSP, because it never drops any spans.
Looks like the tests are still failing: https://github.com/open-telemetry/opentelemetry-go/actions/runs/3586714280/jobs/6036245937
I'm more in favor of stable testing than accepting these changes as is.
The test race appears to be fixed by 9af553a. Are there more changes required before this can merge?
Are there more changes required before this can merge?
"This branch has conflicts that must be resolved" 😉
@prodion23 Are you planning to refresh the PR and address the comment(s)?
Closing this PR as stale, and with conflicts that were not resolved.