runtime Reduce contention for Datacontract Serialization

Making profiling/stress testing my rewrite of OpenRiaServices from WCF to aspnet core I identified a bottleneck in the datacontract serializer.

I detected some contention for an API returning around 20 000 objects of a fairly simple type with a datacontract surrogate which looks the same.

This fix is made in 2 steps,

Improve GetBuiltInDataContract performance
Improve GetID performance

Loadtesting was done from local machine so the load testing (Netling/Westwind Web Surge) also consume resources

Initially GetBuiltInDataContract was the source of contention. Contention time for 20s run:

The lock + Dictionary was replaced by a ConcurrentDictionary instead removing the contention.

When fixing that most contention seemed to move to GetId: Contention time for 20s run after first commit:

The lock + Dictionary was replaced by a ConcurrentDictionary and the lock for the cache is only taken on update

Numbers

On 16 AMD 5800X

8 thead loading

	net7 preview 4	net7 preview 4 + first commit	net7 preview 4 + second commits	net7 preview 4 + 3rd commits
req/sec	115 req/sec	235 req/sec	255 req/sec	258 req/sec
data served	5.1gb	10.5gb	11,3gb	11,5gb
95%th	82,7ms	47,1ms	44,7ms	42,43ms
99%th	94,9ms	72ms	63,6ms	51,51ms

16 threads loading

From left to right:

net7 preview 4 "original" (with r2r)
net7 preview 4 + first commit (~80% cpu load, including load generation)
~net7 preview 4 + second commits~ (100% cpu load including load generation)
net7 preview 4 + 3rd commit (lazy) (100% cpu load)

* net7 preview 4 + 4th commit (lazy + changed key) *

net7 preview 6 + 7th commit (changed key + int value and TryGet similar to code in https://github.com/dotnet/runtime/pull/71752)

1 or 2 threads loaded

~~No significant difference between the versions to draw any conclusions.~~ update August: performance seems to be better even with no or little concurrency.

~5% for 1 thread (78 -> 82 rps)
~16% for 2 threads (135->157 rps)

Other

When only serializing 10 objects per query the improvement drops there is still a measureable improvement ~165 000 rps vs ~<143 000 rps for 16 threads (console logging set to warning)

Jun 13 '22 14:06 Daniel-Svensson

Commenting reply here as well since it is hidden when resolved.

I did a small benchmarks (single threaded), which just invokes GetId for different keys with almost no adds just to measure overhead for the non-contented case and settled for using concurrent dictionary + lazy.


BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-preview.4.22252.9
  [Host]     : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT
  DefaultJob : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT

Method	Mean	Error	StdDev
Original	2.693 μs	0.0211 μs	0.0197 μs
Id2 (using collectionmarshal)	2.853 μs	0.0160 μs	0.0149 μs
Lazy	3.435 μs	0.0252 μs	0.0236 μs
LazyAndValueKey	2.565 μs	0.0119 μs	0.0111 μs
LazyAndValueKeyWithEqualsComparer	2.558 μs	0.0089 μs	0.0083 μs
Faulty	3.407 μs	0.0238 μs	0.0211 μs
RwlockSlim	5.513 μs	0.0208 μs	0.0194 μs
HashSet	2.784 μs	0.0067 μs	0.0056 μs
ConcurrentDictionary<, int> (commit 7)	1.713 μs	0.0157 μs	0.0147 μs

Update: I switched to use RuntimeTypeHandle as key in order to make the code simpler (since there is no ThreadStatic storage required to cache type handles) which also had the benefit of better performance than original code. It you think that is not a good solution then feel free to skip the last commit.

In original code GetBuiltInDataContract took 9,8% of time GetId took 0.5% of time. After commit 3 the numbers where numbers are 0,3% and 0,08%

Jun 14 '22 09:06 Daniel-Svensson

We have a very large PR in the works trying to reconcile the strange port of DCS that has been in .Net Core with the more complete version in .Net 4.8. (#71752). In doing this reconciliation, I believe this issue is also addressed - in a similar but not exactly the same manner. After that PR goes through though, you should see performance improvement here. If not, please reopen the issue.

Aug 03 '22 21:08 StephenMolloy

@StephenMolloy do you have time frame for your large pr, Is there any chance of this getting reviewed in time to make it possible for a merge before the net7 cut-off date at August 15 ?

I did a quick test if your pr and it only improves performance marginally so I think it still makes sense to improve the 2 methods that are changed here. FB924B45071F42D59E3C6E2B51073905

Aug 05 '22 16:08 Daniel-Svensson

As mentioned earlier, a similar approach on this particular issue has been brought back from .Net 4.8 in our mega reconciliation of 7.0 with 4.8. #71752 has been checked in now. Hopefully you see this show up in your performance testing of RC1.

Aug 13 '22 07:08 StephenMolloy

runtime
runtime copied to clipboard

Reduce contention for Datacontract Serialization