go-grpc-prometheus icon indicating copy to clipboard operation
go-grpc-prometheus copied to clipboard

Make EnableClient* methods concurrent safe

Open jeffwidman opened this issue 4 years ago • 10 comments

In one of my apps, I'm hitting a data race error in EnableClientHandlingTimeHistogram. It looks like the problem is that the EnableClient* methods are not concurrent safe.

The problem is triggered when two grpc clients each call grpc_prometheus.EnableClientHandlingTimeHistogram(). They end up sharing the same DefaultClientMetrics which results in two calls to DefaultClientMetrics.EnableClientHandlingTimeHistogram(). That causes the race detector to throw an error about that method not being concurrent safe.

Rather than adding a generic lock, I used sync.Once since it looks like these are only intended to be called once.

jeffwidman avatar May 07 '20 19:05 jeffwidman

Codecov Report

Merging #90 into master will decrease coverage by 0.32%. The diff coverage is 25.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #90      +/-   ##
==========================================
- Coverage   72.45%   72.13%   -0.33%     
==========================================
  Files          11       11              
  Lines         363      366       +3     
==========================================
+ Hits          263      264       +1     
- Misses         89       91       +2     
  Partials       11       11              
Impacted Files Coverage Δ
client_metrics.go 62.85% <25.92%> (-0.65%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9abf3eb...7961b86. Read the comment docs.

codecov-io avatar May 07 '20 19:05 codecov-io

nudge @brancz any chance you could look at this? Should be a relatively low-risk change.

jeffwidman avatar May 11 '20 18:05 jeffwidman

Hi @jeffwidman,

Thanks for this, but I think it would be awesome if we can do this on v2 we plan... and the plan big as we design v2 of prometheus middleware to go inside go-grpc-middleware v2 So... I might think your work will be a bit wasted if we would merge this ):

But!

If you want, you can help us to move prometheus middleware to https://github.com/grpc-ecosystem/go-grpc-middleware/tree/v2

We need a brave volunteer! (: I just created an issue for this: https://github.com/grpc-ecosystem/go-grpc-prometheus/issues/91

bwplotka avatar May 12 '20 09:05 bwplotka

Hmm... The new v2 plan looks interesting, and I'll need to read more.

However, this is a bugfix, and a non-breaking one.

So why not merge it to v1? #91 mentions continuing to accept bugfixes to the v1 branch for a period of time during the migration period.

In fact, merging to v1 now would make sure it is included in both v1 and v2 stuff, so the fix wouldn't need to later be backported.

Again, I think migrating this to be part of all the go-grpc-middleware might be interesting (I need to read/think on it more), all I'm saying I don't see how that affects whether to merge this given that this will apply to both v1 and v2.

jeffwidman avatar May 12 '20 15:05 jeffwidman

nudge @bwplotka what do you think about my comment ☝️...

In fact, merging to v1 now would make sure it is included in both v1 and v2 stuff, so the fix wouldn't need to later be backported.

jeffwidman avatar May 20 '20 23:05 jeffwidman

In fact, merging to v1 now would make sure it is included in both v1 and v2 stuff, so the fix wouldn't need to later be backported.

Not really, it won't be included in v2. We already split branches.

bwplotka avatar May 21 '20 07:05 bwplotka

And we have totally separate implementation so it won't matter much. And yes, if it's a bug we can merge.. But is it?

bwplotka avatar May 21 '20 07:05 bwplotka

BTW I am almost sure you might be using this library wrongly. .EnableClientHandlingTimeHistogram() is literally config option. You should use it once in your application. Why you use it concurrently?

bwplotka avatar May 21 '20 07:05 bwplotka

BTW I am almost sure you might be using this library wrongly. .EnableClientHandlingTimeHistogram() is literally config option. You should use it once in your application. Why you use it concurrently?

It happens when running unit tests if two unit tests both spin up the server (which is a common thing). If you don't run go test -p 1 ./... then go runs the unit tests within separate packages concurrently causing the conflict.

jeffwidman avatar Jul 15 '20 18:07 jeffwidman

Hi 👋🏽

Sorry for the massive, lag. We consolidating projects and moving to single repo where we have more control and awareness. We are moving this code base longer to https://github.com/grpc-ecosystem/go-grpc-middleware/tree/v2 and we moved existing state of Prometheus middleware to https://github.com/grpc-ecosystem/go-grpc-middleware/tree/v2/providers/openmetrics

This means that before we release v2 this is the best opportunity to shape new https://github.com/grpc-ecosystem/go-grpc-middleware/tree/v2/providers/openmetrics with whatever we want to change in API 🤗 If you want to change something effectively long term, I would suggest switching your PR and rebasing it on https://github.com/grpc-ecosystem/go-grpc-middleware/tree/v2/providers/openmetrics instead (notice v2 branch, not master!).

Sorry for the confusion, but it's needed for the project sustainability. Cheers!

Regarding the change I think I understand it, cool. Please check v2, because we changed the code to never use globals, so things should be concurrency friendly

bwplotka avatar Mar 15 '21 15:03 bwplotka