semantic-metrics icon indicating copy to clipboard operation
semantic-metrics copied to clipboard

Add synchronized around getCount dist usage

Open spkrka opened this issue 2 years ago • 6 comments

Since TDigest objects are not thread safe, we need to protect reads from getting corrupted data or potentially throwing exceptions.

spkrka avatar Jul 16 '21 11:07 spkrka

Without this change, it is possible for one thread to read the size of the TDigest at the same time as it's being updated by the record method. Since TDigest is not a thread safe object, this could result in either incorrect output from size, or possibly even unexpected exceptions

spkrka avatar Jul 16 '21 13:07 spkrka

Without this change, it is possible for one thread to read the size of the TDigest at the same time as it's being updated by the record method. Since TDigest is not a thread safe object, this could result in either incorrect output from size, or possibly even unexpected exceptions

I am not sure this is necessary. Tdigest is accessed through an atomicReference. Every operation is atomic so we will never missed any count.

If locking is not an issue, we can synchonized every method and remove the atomicReference. I don't think we need sychronized and the atomic reference.

ao2017 avatar Jul 16 '21 14:07 ao2017

One thread does this: distRef.get().add(..)

Another thread does this: distRef.get().size()

Having an AtomicReference does not prevent add(..) from being called at the same time as size()

spkrka avatar Jul 16 '21 14:07 spkrka

One thread does this: distRef.get().add(..)

Another thread does this: distRef.get().size()

Having an AtomicReference does not prevent add(..) from being called at the same time as size()

One thread does this: distRef.get().add(..)

Another thread does this: distRef.get().size()

Having an AtomicReference does not prevent add(..) from being called at the same time as size()

That's correct. my bad. It has been a while :). I was more concern about thread safety around add / read operation. I was ok with size being off. I don't see a problem making getCount precise but that will increase context switching. What kind of stat are you getting on locking with this implementation ?

ao2017 avatar Jul 16 '21 14:07 ao2017

I did not measure, and I am not sure how getCount is used - is it used for anything important?

spkrka avatar Jul 16 '21 15:07 spkrka

I did not measure, and I am not sure how getCount is used - is it used for anything important?

Almost every 30 seconds, whenever ffwdReporter reports metrics.

ao2017 avatar Jul 16 '21 15:07 ao2017