kit icon indicating copy to clipboard operation
kit copied to clipboard

metrics/dogstatsd: always reset all metrics before writing them when calling WriteTo

Open skwair opened this issue 3 years ago • 11 comments

Description

This PR fixes an issue that occurred when the Dogstatsd client could not reach the Datadog agent to send metrics. It would fail to send counters, return and never reset timings and histograms, resulting in an ever-increasing memory consumption while the client cannot reach the Datadog agent.

skwair avatar Jun 02 '22 08:06 skwair

If local state fails to get sent to Datadog and you reset it anyway, doesn't that lose information and invalidate your metrics?

peterbourgon avatar Jun 02 '22 16:06 peterbourgon

It does, but it's coherent with the method's comment I suppose: "WriteTo abides best-effort semantics, so observations are lost if there is a problem with the write.". The current implementation already loses some information if you have multiple counters for example but you fail to send the first one, no?

We could chose to buffer these metrics while the connection is down (i.e.: only resetting when all writes are successful) instead but then we expose ourselves to the ever-growing memory issue if it lasts for too long.

skwair avatar Jun 03 '22 07:06 skwair

That's fair, I overlooked that caveat in the docs.

peterbourgon avatar Jun 03 '22 12:06 peterbourgon

Is it ok for you or should I update the PR?

skwair avatar Jun 10 '22 15:06 skwair

Hello, sorry for bumping this again, but we would need this fix on our side, any chance this gets merged? Or do you see another implementation for fixing this issue?

skwair avatar Jul 20 '22 08:07 skwair

@peterbourgon can you please take a look at this? It, as a dependency, affects an app in our production env for two months and a downgrade is not an option.

MaxymVlasov avatar Aug 01 '22 15:08 MaxymVlasov

I'm happy to merge with a test that fails on current master and passes on the branch.

peterbourgon avatar Aug 01 '22 17:08 peterbourgon

Hello, thanks for your reply.

I added a test that fails without the fix as you asked. Let me know if it works for you.

skwair avatar Aug 02 '22 10:08 skwair

If I understand correctly, to notify @peterbourgon about comments, need to mention him

MaxymVlasov avatar Aug 09 '22 12:08 MaxymVlasov

Hello @peterbourgon, did you have time to check if the tests added match what you were asking for?

skwair avatar Sep 07 '22 08:09 skwair

@peterbourgon @ChrisHines @basvanbeek friendly ping

ldez avatar Nov 18 '22 15:11 ldez