ttlcache
ttlcache copied to clipboard
use atomics for metrics
AFAIK metricsMu is not really performing any useful function over and above the likely more efficient atomic integer operations available on most modern CPUs. In the Metric() return function, the metricsMu served to ensure all the counters are returned as of an instant in time when no other counter is incremented relative to other counters, but since the write locks are only held for the duration of an increment of a single element of the structure, even that has no utility (i.e., there are never increments of two counters at once while holding the lock across both increments, which is the only thing the read lock would protect). Finally, the atomic loads probably collapse to simple reads on many CPUs, with the atomic load possibly only forcing cpu cache coherency (i.e., you can probably just write Metrics(){return c.metrics} and be just fine).
I think you are right and this might be a good change. Before we merge this, have you done any benchmarking here? I am asking because of this bit:
likely more efficient atomic integer operations available on most modern CPUs
I think you are right and this might be a good change. Before we merge this, have you done any benchmarking here? I am asking because of this bit:
likely more efficient atomic integer operations available on most modern CPUs
name old time/op new time/op delta
CacheSetWithoutTTL-8 290ns ± 0% 288ns ± 0% ~ (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8 516ns ± 0% 463ns ± 0% ~ (p=1.000 n=1+1)
name old alloc/op new alloc/op delta
CacheSetWithoutTTL-8 99.0B ± 0% 97.0B ± 0% ~ (p=1.000 n=1+1)
CacheSetWithGlobalTTL-8 256B ± 0% 253B ± 0% ~ (p=1.000 n=1+1)
name old allocs/op new allocs/op delta
CacheSetWithoutTTL-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
CacheSetWithGlobalTTL-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
I also noticed this potential improvement, glad to see opened PR for this (:
@swithek do you have any concerns regarding this improvement or regarding this particular PR?
I think you are right and this might be a good change. Before we merge this, have you done any benchmarking here? I am asking because of this bit:
likely more efficient atomic integer operations available on most modern CPUs
name old time/op new time/op delta CacheSetWithoutTTL-8 290ns ± 0% 288ns ± 0% ~ (p=1.000 n=1+1) CacheSetWithGlobalTTL-8 516ns ± 0% 463ns ± 0% ~ (p=1.000 n=1+1) name old alloc/op new alloc/op delta CacheSetWithoutTTL-8 99.0B ± 0% 97.0B ± 0% ~ (p=1.000 n=1+1) CacheSetWithGlobalTTL-8 256B ± 0% 253B ± 0% ~ (p=1.000 n=1+1) name old allocs/op new allocs/op delta CacheSetWithoutTTL-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) CacheSetWithGlobalTTL-8 4.00 ± 0% 4.00 ± 0% ~ (all equal)
Having seen the benchmark results, I'm not entirely convinced this would be a good change. It doesn't look like an improvement at all. Perhaps the banchmark tests should be different to properly test this?