dd-trace-go icon indicating copy to clipboard operation
dd-trace-go copied to clipboard

profiler: restore mutex fraction during stop()

Open felixge opened this issue 3 years ago • 4 comments

@pmbauer what do you think about this? IMO it would be nice if the profiler completely cleans up after itself, i.e. restores all runtime profiling rates to their previous settings.

It gets a little tricky with rates that don't allow us to read their previous value, e.g. SetBlockProfileRate but I think for those we could simply set them to 0 to disable them.

Don't merge this yet, I'm just kicking it off to get your thoughts on the matter.

felixge avatar Jan 18 '21 09:01 felixge

Some thoughts ... given both block profile rates and mutex fraction are shared mutable state that the client can modify at any time on their own even after the profiler is started, assuming the user wants the value reset to the exact value at the point the profiler was started won't be sound in all cases. This is not just a hypothetical in that all our dd-go apps can modify these runtime values dynamically with a consul update. And of course the runtime's inconsistency here between mutex and block rates makes the aesthetics poor.

I could see an argument for always setting mutex and block rates to 0 when the profiler stops before treating the two inconsistently. We do call out the respective runtime calls in our docs, but maybe just making this more explicit that starting the profiler will change this runtime state on start is the best overall? Given the users can themselves change these runtime values anytime, maybe that's best?

If the runtime values were exclusive to the profiler and if we could be consistent with handling the two rates I could see providing this auto-reset feature.

More musings ... Adding option wrappers for these two values was as much a convenience as it was a place to hang clarifying documentation on the use of these rates. Maybe it would be better to not release that change (the options for block and mutex rates) and just hang the docs on modifying the runtime directly someplace else?

pmbauer avatar Jan 18 '21 15:01 pmbauer

@pmbauer thanks for your careful thoughts on this!

given both block profile rates and mutex fraction are shared mutable state that the client can modify at any time on their own even after the profiler is started, assuming the user wants the value reset to the exact value at the point the profiler was started won't be sound in all cases. This is not just a hypothetical in that all our dd-go apps can modify these runtime values dynamically with a consul update. And of course the runtime's inconsistency here between mutex and block rates makes the aesthetics poor.

Good point. But I think the main question is whether or not this should be seen as the edge case or the normal case going forward. Depending on this we could make the cleanup opt-in or opt-out via an option.

I could see an argument for always setting mutex and block rates to 0 when the profiler stops before treating the two inconsistently. We do call out the respective runtime calls in our docs, but maybe just making this more explicit that starting the profiler will change this runtime state on start is the best overall? Given the users can themselves change these runtime values anytime, maybe that's best?

Yeah, regardless of what we do for the code, we should definitely document that the profiler is manipulating global state in the runtime.

If the runtime values were exclusive to the profiler and if we could be consistent with handling the two rates I could see providing this auto-reset feature.

More musings ... Adding option wrappers for these two values was as much a convenience as it was a place to hang clarifying documentation on the use of these rates. Maybe it would be better to not release that change (the options for block and mutex rates) and just hang the docs on modifying the runtime directly someplace else?

What do you mean by "Maybe it would be better to not release that change"? I thought that ship has already sailed by merging into the v1 branch? Maybe I'm not understanding the release model yet.

Anyway, we can chat about this a bit during out pairing today.

felixge avatar Jan 19 '21 09:01 felixge

What do you mean by "Maybe it would be better to not release that change"? I thought that ship has already sailed by merging into the v1 branch?

yes :( as a hypothetical it's not ideal to revert that change although it hasn't been included as part of an official point release yet. I think having the wrapper options for modifying the sample rates is worth while; e.g. some hypothetical future where we aren't using the runtime profiler (unlikely, but nice to have the interface to the profiler all in one place rather than scattered between our profile options and runtime.* calls)

But I think the main question is whether or not this should be seen as the edge case or the normal case going forward.

As far as normal vs edge: normal case is to start the profiler on application start and stop it on application stop, so reset on stop is itself an edge case.

pmbauer avatar Jan 19 '21 14:01 pmbauer

As far as normal vs edge: normal case is to start the profiler on application start and stop it on application stop, so reset on stop is itself an edge case.

Agreed. Perhaps this is more of a craftsmanship thing that will increase the perceived quality of the profiler pkg. I can only speak for myself, but I loathe libraries/APIs that don't cleanup after themselves ; ).

Anyway, this is part of me trying to pick some low-hanging fruits, and clearly this isn't as low-hanging as I was hoping. So let's keep this open for now and circle back to it later?

felixge avatar Jan 20 '21 07:01 felixge