ttlcache icon indicating copy to clipboard operation
ttlcache copied to clipboard

use atomics for metrics

Open aathan opened this issue 2 years ago • 4 comments

AFAIK metricsMu is not really performing any useful function over and above the likely more efficient atomic integer operations available on most modern CPUs. In the Metric() return function, the metricsMu served to ensure all the counters are returned as of an instant in time when no other counter is incremented relative to other counters, but since the write locks are only held for the duration of an increment of a single element of the structure, even that has no utility (i.e., there are never increments of two counters at once while holding the lock across both increments, which is the only thing the read lock would protect). Finally, the atomic loads probably collapse to simple reads on many CPUs, with the atomic load possibly only forcing cpu cache coherency (i.e., you can probably just write Metrics(){return c.metrics} and be just fine).

aathan avatar Nov 24 '22 23:11 aathan

I think you are right and this might be a good change. Before we merge this, have you done any benchmarking here? I am asking because of this bit:

likely more efficient atomic integer operations available on most modern CPUs

swithek avatar Nov 26 '22 14:11 swithek

I think you are right and this might be a good change. Before we merge this, have you done any benchmarking here? I am asking because of this bit:

likely more efficient atomic integer operations available on most modern CPUs

name                     old time/op    new time/op    delta
CacheSetWithoutTTL-8        290ns ± 0%     288ns ± 0%   ~     (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8     516ns ± 0%     463ns ± 0%   ~     (p=1.000 n=1+1)

name                     old alloc/op   new alloc/op   delta
CacheSetWithoutTTL-8        99.0B ± 0%     97.0B ± 0%   ~     (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8      256B ± 0%      253B ± 0%   ~     (p=1.000 n=1+1)

name                     old allocs/op  new allocs/op  delta
CacheSetWithoutTTL-8         2.00 ± 0%      2.00 ± 0%   ~     (all equal)
CacheSetWithGlobalTTL-8      4.00 ± 0%      4.00 ± 0%   ~     (all equal)

gaydin avatar Nov 27 '22 13:11 gaydin

I also noticed this potential improvement, glad to see opened PR for this (:

@swithek do you have any concerns regarding this improvement or regarding this particular PR?

biosvs avatar Jan 19 '23 20:01 biosvs

I think you are right and this might be a good change. Before we merge this, have you done any benchmarking here? I am asking because of this bit:

likely more efficient atomic integer operations available on most modern CPUs

name                     old time/op    new time/op    delta
CacheSetWithoutTTL-8        290ns ± 0%     288ns ± 0%   ~     (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8     516ns ± 0%     463ns ± 0%   ~     (p=1.000 n=1+1)

name                     old alloc/op   new alloc/op   delta
CacheSetWithoutTTL-8        99.0B ± 0%     97.0B ± 0%   ~     (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8      256B ± 0%      253B ± 0%   ~     (p=1.000 n=1+1)

name                     old allocs/op  new allocs/op  delta
CacheSetWithoutTTL-8         2.00 ± 0%      2.00 ± 0%   ~     (all equal)
CacheSetWithGlobalTTL-8      4.00 ± 0%      4.00 ± 0%   ~     (all equal)

Having seen the benchmark results, I'm not entirely convinced this would be a good change. It doesn't look like an improvement at all. Perhaps the banchmark tests should be different to properly test this?

swithek avatar Jan 19 '23 21:01 swithek