Add limit for metric aggregation
Description
We have an upper bound for flushing the metrics buffer (atm this is set to 1000 items) but this still does not prevent the SDK from queuing more items (faster than we can flush them), so we want to introduce a second upper bound at which we start dropping incoming metrics.
Hi!
Is this still open? I’m interested in contributing.
Understanding the Issue:
The metrics buffer currently flushes at 1000 items, but in high-throughput scenarios metrics can arrive faster than network requests complete. Since sendEnvelope() is async, metrics may continue queuing while previous flushes are still in-flight potentially causing unbounded buffer growth.
Proposed Approach: I think a reasonable approach could be to introduce a drop limit that accounts for both buffered metrics and in-flight metrics (i.e. flushed but not yet sent).
where maxMetricDropLimit could default to something like 2000. We’d mark metrics as “in-flight” when flushed, and decrement when the network request completes.
If you like this direction, I’m happy to implement it. I’ll also add/update tests, ensure it adheres to any existing style/contribution guidelines, and document the behavior (e.g. what happens when the limit is hit).
Could you please assign this issue to me? I can open a PR once I get confirmation.
Hey, yes that is still open. We are open for contributions and I am very curious about your implementation (on the PR itself @chargome will have the final word though).
About your proposed approach - I think I didn't 100% understand it yet, but I think it will clear up once the implementation is there
Hi @JPeer264, the command yarn run test is failing immediately after setting up the repository. Here's the output: https://cloud.nx.app/runs/5IQYPtGbpi. However, running the tests individually passes successfully. Can you assist me with this?
Hey @JPeer264 @chargome, I've created a draft PR (#18467) for you to review. When you have a moment, could you review the implementation? Once I receive your feedback, I'll address the failing size check workflow. Thanks!
Thanks a lot for the PR, I'll have a look as soon as I can. About the tests, this could be that some are flaky (I'll look into that) - did you manage to run them individually?
Hey there! Thanks so much for taking the time to investigate this and for putting together a PR @malay44 — we really appreciate the effort.
At the moment, we have some ongoing work and traction around the general telemetry buffer (link), so we won’t be able to merge a metrics-specific PR right now. There are upcoming changes planned for the shared buffer logic, and we want to avoid introducing a separate mechanism ahead of that. Additionally, a metrics-only solution would increase bundle size, which we’d like to avoid.
For now, we’ll continue dropping metrics at the transport layer.
Closing this issue for now.