Performance degradation in 9.2 compared to 9.0
Is there an existing issue for this?
- [x] I have searched the existing issues
Describe the bug
When upgrading to 9.2 from 9.0 in my Orleans application I noticed a two order of magnitude degradation in number of requests from my performance tests.
I have created a small reproduction of the issue where you can compare 9.2 vs the same app hosted using docker-compose in the main branch, then there is also an aspire-9.0 branch that you can use to test 9.0. The results on my machine are that 9.2 is getting two order of magnitude requests through in my performance test compared to docker-compose and 9.0.
Expected Behavior
I expected the performance to be as good or better in 9.2 compared to 9.0.
Steps To Reproduce
All you need to know to reproduce the issue is in the readme of my repo with the reproduction: https://github.com/mastoj/orleansperfrepro/blob/main/readme.md
Exceptions (if any)
No response
.NET Version info
❯ dotnet --info .NET SDK: Version: 9.0.100 Commit: 59db016f11 Workload version: 9.0.100-manifests.3068a692 MSBuild version: 17.12.7+5b8665660
Runtime Environment: OS Name: Mac OS X OS Version: 15.3 OS Platform: Darwin RID: osx-arm64 Base Path: /usr/local/share/dotnet/sdk/9.0.100/
.NET workloads installed: There are no installed workloads to display. Configured to use loose manifests when installing new manifests.
Host: Version: 9.0.0 Architecture: arm64 Commit: 9d5a6a9aa4
Anything else?
No response
cc @karolz-ms @danegsta @ReubenBond
I have noticed this degradation as well when upgrading from Aspire 9.0 to 9.1. I do not use Orleans. It is a medium sized project with Nats, Mqtt, and several other wep api services. With no other change but upgrading to Aspire 9.1, the system goes from handling several thousands of notifications per second to under one thousand. The Aspire dcpctrl process uses over 60% of total cpu.
To be clear, I think I also saw it from 9.0 to 9.1, but verified it now with 9.0 and 9.2.
Is this a real issue or am I doing something wrong?
Hi @mastoj
I ran your load test on my machine and got ~750k for 9.2 and ~650k for 9.0
Could you please recheck your result?
@illay1994 and you are 100% you were running the right versions of the code?
Here are my result for 9.2 just now:
And for 9.0:
The only change between my two runs are one is from the main branch (uses aspire 9.2) and the other from the aspire-9.0 branch. Everything else should be the same between the two runs.
Has anyone else manage except than me managed to reproduce the issue?
@danegsta , what was the resolution here? Any PR that fixed this?
We diagnosed a performance regression in the orchestrator that provides the service proxies; the fix will be in our next insertion (likely either tomorrow or Monday). The repro project was very helpful in tracking things down!
Specifically, we'd made a fix for a data corruption bug that occurred under very particular circumstances, but that introduced the perf regression you reported. We reworked the logic to fix both cases and added a few extra performance tests for that area to help avoid regressing either again.
@danegsta , awesome and glad I could help!
Will the fix be released as a minor update or do I have to wait?
I doubt it'll make the cutoff for the 9.3 release as we're pretty well locked down for anything but breaking changes, but I'll raise taking it for a 9.3.1 servicing release. Worst case it'd go out with the 9.4 release.
@mastoj the fix is in main now if you want a chance to test it out.
You should be able to work around it by turning off the proxy right?
@davidfowl is there an easy way to do that?
Add this annotation to your resource ProxySupportAnnotation using WithAnnotation and set the bool to false.
It was not just to add the annotation. Since it is a POC I am working on I will just wait for the real fix. As of now it is basically easier to just hand wire the things as I see it.
@danegsta , will there be a 9.3.1 release where this will be fixed?