WIP #184 add async AddAsync and RemoveAsync to ICache
@MichaCo
Attempted to add AddAsync and RemoveAsync methods to ICache. Not sure what to do with .net 4.5 and lower because it does not support Task.CompletedTask and Task.FromResult. Any advice? Could we add async support only to .netstandard?
Task.FromResult should be available in .NET 4.5. Completed Task is just a shortcut QOL feature which returns a cached completed task. We can add our own I guess.
One more thing, I'm not sure if we really have to implement two complete code paths for sync and async. Maybe with ValueTask<T> we can have only one? Would need some performance testing though as ValueTask can produce more allocation in some cases
Hi @MichaCo,
Task.FromResult should be available in .NET 4.5. Completed Task is just a shortcut QOL feature which returns a cached completed task. We can add our own I guess.
Updated defined to include net4.5. So only net 4.0 does not have async support. Is it OK?
Added GetCacheItemAsync to ICache and added benchmark tests for async methods: GetSingleBenchmark and AddSingleBenchmark.
On my machine I get following results:
AddSingleBenchmark
BenchmarkDotNet=v0.10.8, OS=Windows 10.0.17134
Processor=Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), ProcessorCount=8
Frequency=10000000 Hz, Resolution=100.0000 ns, Timer=UNKNOWN
[Host] : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0
Job-VUBPRE : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0
Runtime=Clr LaunchCount=1 TargetCount=10
WarmupCount=4
| Method | Mean | Error | StdDev | Median | Op/s | Scaled | ScaledSD | Rank | Gen 0 | Allocated |
|---|---|---|---|---|---|---|---|---|---|---|
| Dictionary | 245.1 ns | 3.064 ns | 1.823 ns | 245.3 ns | 4,079,270.3 | 1.00 | 0.00 | 1 | 0.0439 | 184 B |
| DictionaryAsync | 522.9 ns | 7.752 ns | 4.613 ns | 521.9 ns | 1,912,250.9 | 2.13 | 0.02 | 2 | 0.0725 | 304 B |
| Runtime | 1,468.7 ns | 62.331 ns | 41.228 ns | 1,463.9 ns | 680,871.8 | 5.99 | 0.16 | 5 | 0.7534 | 3168 B |
| RuntimeAsync | 1,856.5 ns | 39.374 ns | 26.044 ns | 1,853.4 ns | 538,644.1 | 7.57 | 0.11 | 6 | 0.7820 | 3288 B |
| MsMemory | 801.9 ns | 34.341 ns | 22.714 ns | 790.3 ns | 1,247,059.1 | 3.27 | 0.09 | 3 | 0.1068 | 448 B |
| MsMemoryAsync | 1,301.3 ns | 18.890 ns | 12.494 ns | 1,297.4 ns | 768,482.7 | 5.31 | 0.06 | 4 | 0.3414 | 1432 B |
| Redis | 386,316.3 ns | 4,980.507 ns | 3,294.297 ns | 386,087.5 ns | 2,588.6 | 1,575.97 | 16.85 | 9 | - | 1372 B |
| RedisAsync | 431,304.8 ns | 6,662.711 ns | 4,406.971 ns | 430,890.5 ns | 2,318.5 | 1,759.50 | 21.03 | 10 | 0.4883 | 3772 B |
| Memcached | 354,375.7 ns | 4,301.594 ns | 2,249.820 ns | 354,996.2 ns | 2,821.9 | 1,445.66 | 13.26 | 7 | 2.9297 | 13971 B |
| MemcachedAsync | 362,075.7 ns | 5,459.093 ns | 3,610.852 ns | 361,915.5 ns | 2,761.9 | 1,477.08 | 17.38 | 8 | 2.9297 | 14088 B |
GetSingleBenchmark
BenchmarkDotNet=v0.10.8, OS=Windows 10.0.17134
Processor=Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), ProcessorCount=8
Frequency=10000000 Hz, Resolution=100.0000 ns, Timer=UNKNOWN
[Host] : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0
Job-VUBPRE : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0
Runtime=Clr LaunchCount=1 TargetCount=10
WarmupCount=4
| Method | Mean | Error | StdDev | Median | Op/s | Scaled | ScaledSD | Rank | Gen 0 | Allocated |
|---|---|---|---|---|---|---|---|---|---|---|
| Dictionary | 79.11 ns | 0.5664 ns | 0.3746 ns | 78.97 ns | 12,640,504.2 | 1.00 | 0.00 | 1 | - | 0 B |
| DictionaryAsync | 200.59 ns | 1.8190 ns | 1.0825 ns | 200.24 ns | 4,985,332.5 | 2.54 | 0.02 | 2 | 0.0191 | 80 B |
| Runtime | 299.25 ns | 1.5175 ns | 0.9030 ns | 299.42 ns | 3,341,697.1 | 3.78 | 0.02 | 4 | 0.0496 | 208 B |
| RuntimeAsync | 435.37 ns | 5.0081 ns | 2.2236 ns | 436.33 ns | 2,296,887.3 | 5.50 | 0.04 | 6 | 0.0682 | 288 B |
| MsMemory | 228.52 ns | 3.2292 ns | 2.1359 ns | 227.86 ns | 4,375,908.9 | 2.89 | 0.03 | 3 | - | 0 B |
| MsMemoryAsync | 360.95 ns | 1.8701 ns | 1.2370 ns | 360.61 ns | 2,770,464.4 | 4.56 | 0.03 | 5 | 0.0191 | 80 B |
| Redis | 252,553.58 ns | 5,189.8029 ns | 3,432.7334 ns | 252,332.28 ns | 3,959.6 | 3,192.47 | 43.57 | 7 | - | 2011 B |
| RedisAsync | 252,382.93 ns | 4,476.1477 ns | 2,960.6947 ns | 253,102.04 ns | 3,962.2 | 3,190.31 | 38.26 | 7 | 0.4883 | 2096 B |
| Memcached | 270,045.65 ns | 5,271.5009 ns | 3,486.7716 ns | 270,649.05 ns | 3,703.1 | 3,413.58 | 44.51 | 9 | 3.4180 | 14393 B |
| MemcachedAsync | 264,300.18 ns | 3,740.8318 ns | 2,474.3287 ns | 265,055.46 ns | 3,783.6 | 3,340.95 | 33.22 | 8 | 3.4180 | 14473 B |
For distributed cached I don't see much difference between sync and async version, but for Dictionary it is 2 times slower. It is probably because of async/await overhead.
Yes its find to skip .NET40, I'll probably drop support for that soon anyways.
Thanks for doing some performance testing, some interesting results actually.
I think we have to work on the allocations when adding items. The async Add almost triples the bytes allocated across all the different implementations.
Haven't looked at your code changes yet. Will review it soon
I will try to replace Task to ValueTask and will run the benchmark again. Will post my results here.
@MichaCo
I created a separate branch and changed all Task to ValueTask. See benchmarks with ValueTask below:
AddSingleBenchmark
BenchmarkDotNet=v0.10.8, OS=Windows 10.0.17134
Processor=Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), ProcessorCount=8
Frequency=10000000 Hz, Resolution=100.0000 ns, Timer=UNKNOWN
[Host] : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0
Job-VUBPRE : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0
Runtime=Clr LaunchCount=1 TargetCount=10
WarmupCount=4
| Method | Mean | Error | StdDev | Median | Op/s | Scaled | ScaledSD | Rank | Gen 0 | Allocated |
|---|---|---|---|---|---|---|---|---|---|---|
| Dictionary | 247.1 ns | 3.239 ns | 2.142 ns | 246.7 ns | 4,046,226.5 | 1.00 | 0.00 | 1 | 0.0439 | 184 B |
| DictionaryAsync | 537.3 ns | 2.536 ns | 1.509 ns | 536.8 ns | 1,861,273.2 | 2.17 | 0.02 | 2 | 0.0439 | 184 B |
| Runtime | 1,493.3 ns | 27.160 ns | 16.163 ns | 1,491.5 ns | 669,641.8 | 6.04 | 0.08 | 5 | 0.7534 | 3168 B |
| RuntimeAsync | 1,856.1 ns | 9.370 ns | 4.901 ns | 1,857.3 ns | 538,770.9 | 7.51 | 0.06 | 6 | 0.7515 | 3168 B |
| MsMemory | 781.5 ns | 11.407 ns | 7.545 ns | 778.2 ns | 1,279,641.6 | 3.16 | 0.04 | 3 | 0.1068 | 448 B |
| MsMemoryAsync | 1,314.3 ns | 6.401 ns | 3.809 ns | 1,313.2 ns | 760,852.3 | 5.32 | 0.05 | 4 | 0.3111 | 1312 B |
| Redis | 367,380.1 ns | 5,861.867 ns | 3,877.262 ns | 367,944.5 ns | 2,722.0 | 1,486.60 | 19.23 | 7 | - | 1372 B |
| RedisAsync | 434,222.9 ns | 11,983.929 ns | 7,926.628 ns | 431,761.0 ns | 2,303.0 | 1,757.08 | 33.66 | 8 | 0.4883 | 3916 B |
| Memcached | 382,094.3 ns | 42,926.427 ns | 28,393.175 ns | 371,250.6 ns | 2,617.2 | 1,546.14 | 109.73 | 7 | 2.9297 | 13974 B |
| MemcachedAsync | 388,666.3 ns | 5,296.766 ns | 3,503.483 ns | 388,939.6 ns | 2,572.9 | 1,572.74 | 18.62 | 7 | 2.9297 | 13972 B |
GetSingleBenchmark
BenchmarkDotNet=v0.10.8, OS=Windows 10.0.17134
Processor=Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), ProcessorCount=8
Frequency=10000000 Hz, Resolution=100.0000 ns, Timer=UNKNOWN
[Host] : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0
Job-VUBPRE : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0
Runtime=Clr LaunchCount=1 TargetCount=10
WarmupCount=4
| Method | Mean | Error | StdDev | Median | Op/s | Scaled | ScaledSD | Rank | Gen 0 | Allocated |
|---|---|---|---|---|---|---|---|---|---|---|
| Dictionary | 79.78 ns | 0.3568 ns | 0.2123 ns | 79.68 ns | 12,533,907.5 | 1.00 | 0.00 | 1 | - | 0 B |
| DictionaryAsync | 208.47 ns | 1.0048 ns | 0.6646 ns | 208.89 ns | 4,796,959.2 | 2.61 | 0.01 | 2 | - | 0 B |
| Runtime | 317.83 ns | 4.8045 ns | 3.1778 ns | 318.14 ns | 3,146,343.1 | 3.98 | 0.04 | 4 | 0.0496 | 208 B |
| RuntimeAsync | 472.30 ns | 22.5131 ns | 14.8910 ns | 466.07 ns | 2,117,308.2 | 5.92 | 0.18 | 6 | 0.0496 | 208 B |
| MsMemory | 237.90 ns | 3.0780 ns | 1.6099 ns | 237.70 ns | 4,203,440.9 | 2.98 | 0.02 | 3 | - | 0 B |
| MsMemoryAsync | 368.84 ns | 2.3832 ns | 1.4182 ns | 369.32 ns | 2,711,211.2 | 4.62 | 0.02 | 5 | - | 0 B |
| Redis | 249,763.77 ns | 2,653.5906 ns | 1,579.1098 ns | 249,455.05 ns | 4,003.8 | 3,130.54 | 20.24 | 7 | - | 2011 B |
| RedisAsync | 248,816.87 ns | 10,174.4389 ns | 6,054.6475 ns | 247,216.42 ns | 4,019.0 | 3,118.67 | 71.97 | 7 | - | 2011 B |
| Memcached | 270,554.35 ns | 17,462.0237 ns | 11,550.0480 ns | 264,914.12 ns | 3,696.1 | 3,391.12 | 137.60 | 8 | 3.4180 | 14392 B |
| MemcachedAsync | 274,713.17 ns | 10,972.4318 ns | 7,257.5846 ns | 274,374.21 ns | 3,640.2 | 3,443.25 | 86.73 | 8 | 3.4180 | 14392 B |
looks like we have a winner? ;)
Yes, ValueTask looks better compared to Task. But sync vs async for inmemory cache still significantly slower. I guess we better not change sync implementation and only add new API for async.
yup, agree. I might have to do more testing and optimizations to see why exactly it is much slower. But for sure, the statemachine stuff produces some overhead
@MichaCo, I added async methods to ICacheManager. Should I also include this into this pull request? The PR is quite large already so could it be better to create a new PR when this merged?
feel free to create another one, yup
Thanks for the great work btw! Hope I get some time soon to merge and review it properly.
Any updates to this? We have a project that could really benefit from async caching calls...
@cdonnellytx This might take a while until all this is totally done/done and released. It is a pretty big change and very little time atm Sorry ;)
+1 for this change, async is very important for my organisation
+1 for this change, async will improve the performance a lot
+1 we need this as well
any update on merging this in? seems like it’s been open for over a year.
+1 for this change. The async/sync code that we need to create to make this work could lead to threadpool starvation, in other words, this is a great library but definitely it needs this :) IMO the additional support for the async code poses minimal risk to the library and the benefits are many.
+1 for this change !
+1, any updates on async support?