runtime
runtime copied to clipboard
Reduce contention for Datacontract Serialization
Making profiling/stress testing my rewrite of OpenRiaServices from WCF to aspnet core I identified a bottleneck in the datacontract serializer.
I detected some contention for an API returning around 20 000 objects of a fairly simple type with a datacontract surrogate which looks the same.
This fix is made in 2 steps,
- Improve GetBuiltInDataContract performance
- Improve GetID performance
Loadtesting was done from local machine so the load testing (Netling/Westwind Web Surge) also consume resources
Initially GetBuiltInDataContract was the source of contention.
Contention time for 20s run:
- The lock + Dictionary was replaced by a ConcurrentDictionary instead removing the contention.
When fixing that most contention seemed to move to GetId:
Contention time for 20s run after first commit:
- The lock + Dictionary was replaced by a ConcurrentDictionary and the lock for the cache is only taken on update
Other possible solutions / comments
- The first fix maybe setup the whole dictionary on first access and then keeping the dictionary read only ? I decided to keep current logic in order to keep the change small and not affect startup/first access performance to much,
- I looked into caching a static Func for the factory method, but I got problem due to the RequiresUnreferencedCode so i used a static lambda instead (which seems to be cached by roslyn, but with an extra if statement)
Numbers
On 16 AMD 5800X
8 thead loading
net7 preview 4 | net7 preview 4 + first commit | net7 preview 4 + second commits | net7 preview 4 + 3rd commits | |
---|---|---|---|---|
req/sec | 115 req/sec | 235 req/sec | 255 req/sec | 258 req/sec |
data served | 5.1gb | 10.5gb | 11,3gb | 11,5gb |
95%th | 82,7ms | 47,1ms | 44,7ms | 42,43ms |
99%th | 94,9ms | 72ms | 63,6ms | 51,51ms |
16 threads loading
From left to right:
- net7 preview 4 "original" (with r2r)
- net7 preview 4 + first commit (~80% cpu load, including load generation)
- ~net7 preview 4 + second commits~ (100% cpu load including load generation)
- net7 preview 4 + 3rd commit (lazy) (100% cpu load)


- net7 preview 6 + 7th commit (changed key + int value and TryGet similar to code in https://github.com/dotnet/runtime/pull/71752)
-
1 or 2 threads loaded
~~No significant difference between the versions to draw any conclusions.~~ update August: performance seems to be better even with no or little concurrency.
- ~5% for 1 thread (78 -> 82 rps)
- ~16% for 2 threads (135->157 rps)
Other
- When only serializing 10 objects per query the improvement drops there is still a measureable improvement ~165 000 rps vs ~<143 000 rps for 16 threads (console logging set to warning)
Commenting reply here as well since it is hidden when resolved.
I did a small benchmarks (single threaded), which just invokes GetId for different keys with almost no adds just to measure overhead for the non-contented case and settled for using concurrent dictionary + lazy.
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-preview.4.22252.9
[Host] : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT
DefaultJob : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT
Method | Mean | Error | StdDev |
---|---|---|---|
Original | 2.693 μs | 0.0211 μs | 0.0197 μs |
Id2 (using collectionmarshal) | 2.853 μs | 0.0160 μs | 0.0149 μs |
Lazy | 3.435 μs | 0.0252 μs | 0.0236 μs |
LazyAndValueKey | 2.565 μs | 0.0119 μs | 0.0111 μs |
LazyAndValueKeyWithEqualsComparer | 2.558 μs | 0.0089 μs | 0.0083 μs |
Faulty | 3.407 μs | 0.0238 μs | 0.0211 μs |
RwlockSlim | 5.513 μs | 0.0208 μs | 0.0194 μs |
HashSet | 2.784 μs | 0.0067 μs | 0.0056 μs |
ConcurrentDictionary<, int> (commit 7) | 1.713 μs | 0.0157 μs | 0.0147 μs |
Update: I switched to use RuntimeTypeHandle as key in order to make the code simpler (since there is no ThreadStatic storage required to cache type handles) which also had the benefit of better performance than original code. It you think that is not a good solution then feel free to skip the last commit.
In original code GetBuiltInDataContract took 9,8% of time GetId took 0.5% of time. After commit 3 the numbers where numbers are 0,3% and 0,08%
We have a very large PR in the works trying to reconcile the strange port of DCS that has been in .Net Core with the more complete version in .Net 4.8. (#71752). In doing this reconciliation, I believe this issue is also addressed - in a similar but not exactly the same manner. After that PR goes through though, you should see performance improvement here. If not, please reopen the issue.
@StephenMolloy do you have time frame for your large pr, Is there any chance of this getting reviewed in time to make it possible for a merge before the net7 cut-off date at August 15 ?
I did a quick test if your pr and it only improves performance marginally so I think it still makes sense to improve the 2 methods that are changed here.
As mentioned earlier, a similar approach on this particular issue has been brought back from .Net 4.8 in our mega reconciliation of 7.0 with 4.8. #71752 has been checked in now. Hopefully you see this show up in your performance testing of RC1.