client_python icon indicating copy to clipboard operation
client_python copied to clipboard

CollectorRegistry undefined order of unlocking

Open tadeas-vintrlik opened this issue 1 year ago • 5 comments

The CollectorRegistry uses theading.Lock for mutexing purposes. It does not have a defined order of unlocks. In our use-case sometimes three or more actions happen almost simultaneously and sometime this would case them to appear in the wrong order.

The easiest solution in my opinion would be to use threading.Semaphore instead.

tadeas-vintrlik avatar Aug 02 '23 06:08 tadeas-vintrlik

I would even be willing to create a PR, however before that I would like to know whether this would be desired in the first place. As a temporary work around I used a threading.Semaphore in our exporter on top of CollectorRegistry this however is largely unnecessary as the locking is implemented twice: once in our exporter and once in the client library.

tadeas-vintrlik avatar Aug 11 '23 14:08 tadeas-vintrlik

Can you please explain more about your issue?

roidelapluie avatar Aug 16 '23 05:08 roidelapluie

Sure, our use-case might a bit of a stretch for a pull-based problem. I wrote a simple exporter using this library to expose the metrics of our CI (buildbot). While one already exists https://github.com/claws/buildbot-prometheus it is not very useful for us.

What we need is the state of the build: running, pending, success, failed, etc. These events come asynchronously via buildbot's master Message Queue. What would quite often happen is that there would three state changes very close of one another say pending -> running -> success within a couple ms.

The pending would acquire the lock for the CollectorRegistry as would the other two. Then since the order of unlocking is not defined for Locks the events would end up in the opposite order there fore it would appear as pending -> success -> running and the build would appear to hang on running even when it was not. This would not change until it was run again.

This could be solved by using a Semaphore instead which has defined order of unlocking in the order of acquiring them.

I understand this situation is more suitable for an event based push system, but for now it is working for us as is with the addition of a Semaphore on top of CollectorRegistry

tadeas-vintrlik avatar Aug 17 '23 06:08 tadeas-vintrlik

What is the performance cost of this change?

roidelapluie avatar Sep 03 '23 21:09 roidelapluie

I was unable to find a resource on the exact performance difference of Lock and Semaphore in Python.

However the difference should not be that noticeable since a lock is a special case (unary) semaphore. The only difference is that lock only holds boolean value whether it is open or not whereas a semaphore has a counter that increments and decrements. The counter then decides which thread should go into the critical section next.

tadeas-vintrlik avatar Sep 13 '23 07:09 tadeas-vintrlik