tetragon
tetragon copied to clipboard
exporter: fix flaky Test_rateLimitExport with polling
Summary
This PR fixes the flaky Test_rateLimitExport test by replacing the fixed sleep duration with a polling-based synchronization mechanism, addressing the timing race condition reported in #2789.
Related Issue
Fixes #2789
Root Cause
The test used a fixed 200ms sleep which allowed multiple rate limiter ticker intervals (50ms each) to fire during the wait period. This created a timing window where:
- Multiple tickers could emit rate-limit-info messages if events were still processing
- A race condition at ticker boundaries could cause off-by-one errors in event counts
- No synchronization existed between "events sent" and "events fully processed"
Proposed Changes
- Added countEvents() helper function: Non-blocking function to count events and rate-limit-info messages without assertions
- Replaced fixed sleep with polling: Poll every 10ms until expected number of events and rate-limit-info messages are received
- Added timeout with clear diagnostics: 500ms timeout (2.5× original sleep) with descriptive error message showing actual vs. expected counts
Testing Performed
- Test passes 20 consecutive runs without failures (verified via Docker with golang:1.25)
- No performance degradation - typically should complete faster than the original
- Deterministic behavior regardless of system timing or load
Backward Compatibility
No breaking changes. The fix only modifies the test implementation, not the rate limiter functionality itself.
Changelog
This PR fixes a flaky test with no user-facing changes. Could a maintainer please add the release-note/misc label? Thanks!