aspire icon indicating copy to clipboard operation
aspire copied to clipboard

Performance degradation in 9.2 compared to 9.0

Open mastoj opened this issue 8 months ago • 6 comments

Is there an existing issue for this?

  • [x] I have searched the existing issues

Describe the bug

When upgrading to 9.2 from 9.0 in my Orleans application I noticed a two order of magnitude degradation in number of requests from my performance tests.

I have created a small reproduction of the issue where you can compare 9.2 vs the same app hosted using docker-compose in the main branch, then there is also an aspire-9.0 branch that you can use to test 9.0. The results on my machine are that 9.2 is getting two order of magnitude requests through in my performance test compared to docker-compose and 9.0.

Expected Behavior

I expected the performance to be as good or better in 9.2 compared to 9.0.

Steps To Reproduce

All you need to know to reproduce the issue is in the readme of my repo with the reproduction: https://github.com/mastoj/orleansperfrepro/blob/main/readme.md

Exceptions (if any)

No response

.NET Version info

❯ dotnet --info .NET SDK: Version: 9.0.100 Commit: 59db016f11 Workload version: 9.0.100-manifests.3068a692 MSBuild version: 17.12.7+5b8665660

Runtime Environment: OS Name: Mac OS X OS Version: 15.3 OS Platform: Darwin RID: osx-arm64 Base Path: /usr/local/share/dotnet/sdk/9.0.100/

.NET workloads installed: There are no installed workloads to display. Configured to use loose manifests when installing new manifests.

Host: Version: 9.0.0 Architecture: arm64 Commit: 9d5a6a9aa4

Anything else?

No response

mastoj avatar Apr 21 '25 19:04 mastoj

cc @karolz-ms @danegsta @ReubenBond

davidfowl avatar Apr 21 '25 19:04 davidfowl

I have noticed this degradation as well when upgrading from Aspire 9.0 to 9.1. I do not use Orleans. It is a medium sized project with Nats, Mqtt, and several other wep api services. With no other change but upgrading to Aspire 9.1, the system goes from handling several thousands of notifications per second to under one thousand. The Aspire dcpctrl process uses over 60% of total cpu.

AlexNik4 avatar Apr 24 '25 16:04 AlexNik4

To be clear, I think I also saw it from 9.0 to 9.1, but verified it now with 9.0 and 9.2.

mastoj avatar Apr 24 '25 18:04 mastoj

Is this a real issue or am I doing something wrong?

mastoj avatar Apr 27 '25 11:04 mastoj

Hi @mastoj

I ran your load test on my machine and got ~750k for 9.2 and ~650k for 9.0

Could you please recheck your result?

illay1994 avatar Apr 27 '25 19:04 illay1994

@illay1994 and you are 100% you were running the right versions of the code?

Here are my result for 9.2 just now:

Image

And for 9.0:

Image

The only change between my two runs are one is from the main branch (uses aspire 9.2) and the other from the aspire-9.0 branch. Everything else should be the same between the two runs.

mastoj avatar Apr 27 '25 20:04 mastoj

Has anyone else manage except than me managed to reproduce the issue?

mastoj avatar May 07 '25 10:05 mastoj

@danegsta , what was the resolution here? Any PR that fixed this?

mastoj avatar May 15 '25 22:05 mastoj

We diagnosed a performance regression in the orchestrator that provides the service proxies; the fix will be in our next insertion (likely either tomorrow or Monday). The repro project was very helpful in tracking things down!

danegsta avatar May 15 '25 22:05 danegsta

Specifically, we'd made a fix for a data corruption bug that occurred under very particular circumstances, but that introduced the perf regression you reported. We reworked the logic to fix both cases and added a few extra performance tests for that area to help avoid regressing either again.

danegsta avatar May 15 '25 22:05 danegsta

@danegsta , awesome and glad I could help!

Will the fix be released as a minor update or do I have to wait?

mastoj avatar May 15 '25 23:05 mastoj

I doubt it'll make the cutoff for the 9.3 release as we're pretty well locked down for anything but breaking changes, but I'll raise taking it for a 9.3.1 servicing release. Worst case it'd go out with the 9.4 release.

danegsta avatar May 15 '25 23:05 danegsta

@mastoj the fix is in main now if you want a chance to test it out.

danegsta avatar May 16 '25 01:05 danegsta

You should be able to work around it by turning off the proxy right?

davidfowl avatar May 16 '25 10:05 davidfowl

@davidfowl is there an easy way to do that?

mastoj avatar May 16 '25 12:05 mastoj

Add this annotation to your resource ProxySupportAnnotation using WithAnnotation and set the bool to false.

davidfowl avatar May 16 '25 13:05 davidfowl

It was not just to add the annotation. Since it is a POC I am working on I will just wait for the real fix. As of now it is basically easier to just hand wire the things as I see it.

mastoj avatar May 21 '25 16:05 mastoj

@danegsta , will there be a 9.3.1 release where this will be fixed?

mastoj avatar Jun 03 '25 13:06 mastoj