FrameworkBenchmarks
FrameworkBenchmarks copied to clipboard
Update Citrine to Ubuntu 22.04
Why?
For the kernel.
The new kernel in 22.04 fix a lot of performance degradation, from the Spectre / Meltdown Mitigation security fixes. That we noted from round 18 to 19. https://www.phoronix.com/scan.php?page=news_item&px=FSGSBASE-For-Linux-5.9 https://www.phoronix.com/scan.php?page=article&item=x86-fsgsbase-xeonr https://www.phoronix.com/scan.php?page=news_item&px=FSGSBASE-v12-Linux-Patches
And the Intel Xeon Gold 5120 CPU CPUs in citrine, was one of the most affected to this security patch. https://www.phoronix.com/scan.php?page=article&item=linux-419-mitigations https://www.phoronix.com/scan.php?page=article&item=spectre-meltdown-2
Nginx example of the impact in Intel Xeon Gold

Nginx plaintext test:
- Round 18: 3,752,793 req/s
- Round 19: 2,244,289 req/s
Great point @joanhey -- I'll see if we can get this done this week so we can give plenty of time to review results before we start putting deadlines on R21.
I'm not going to be able to get to this for this round, but it will be the first thing we do for Round 22. Also, may be some news around new machines at the same time. ;)
Please, update the kernel before change to new machines.
So we can check the performance change!!
Why?
Because it's the LTS and that's what everyone is deploying, right? Even if it was slower, we should follow the LTS IMO.
Another reason to upgrade the operating system (and the kernel in particular) is in order to get any potential optimizations of the Retbleed mitigations, though those may require the Hardware Enablement (HWE) stack. As some people have noticed, starting with run ID 0cc85430-0486-4cdb-acbd-a910a6a50cda there has been a significant performance regression across the board, particularly in the fortunes and the JSON serialization tests. The initial suspect was PR #7783, since it was the only toolset change (that is, a change that would affect all test implementations) that happened between that run and the last successful run that preceeded it, but #7832, #7852, and #7877 demonstrated that this was just a coincidence, so now I suspect that the most likely culprit are the Retbleed mitigations. This speculative execution vulnerability was announced on the 12th of July, but judging by the live results dashboard, Citrine was down at that time, so presumably it did not receive the relevant kernel upgrade (though I do not know if the TechEmpower team has enabled automatic security updates to the kernel). The first chance the benchmarking environment had to receive the Retbleed mitigations was during the subsequent downtime in September and October (kernel upgrades require a system restart), but it seems that for some reason no security fixes were applied. Finally, the next occasion on which Citrine had to be restarted was right before the problematic run, and this time apparently all pending kernel updates were applied, hence the performance regression.
As for the cost of the Retbleed mitigations, Phoronix has an article that explores it on a Intel Xeon E3-1245 v5 server, which is the same processor generation, Skylake, as the Xeon Gold 5120 in the Citrine environment. The 10.51% regression in the article is similar to what we are seeing. Note that not all test implementations are affected in the same way; for example, consider h2o in the plaintext test - it performs at least one system call per pipelined response, and as a result has regressed by 29.81%, while the top performers, which send all 16 pipelined responses with a single system call, have barely changed, if at all. In other words, frameworks that are not efficient to begin with have probably been disproportionately punished by the Retbleed mitigations.
Speaking of which, there is a Linux kernel feature that has progressed quite a lot recently and that could reduce the number of system calls performed dramatically, as long as a framework is able to use it (which probably requires code changes), so it has the potential to be a game changer, at least in the current Citrine environment, which seems to be severely affected by all the mitigations for speculative execution vulnerabilities (which are in effect during system calls in particular); I think the JSON serialization results used to be around 1800000 requests per second, and are around 1300000 now. That feature is io_uring, but unfortunately as discussed elsewhere currently the kernel versions in the Citrine environment are too old to support it properly (or at all), so there is yet another reason to perform an OS upgrade.
Example with Nginx plaintext results over time.

But also affect the databases performance.
I'm going to let this run finish and then start on upgrades. I'm OOO Thursday so I will most likely start on Friday and interrupt the run. We may be down over the weekend, but this is long overdue. Thanks!
I know that 3 brand new machines are waiting in a lab somewhere in Redmond. Just saying...
I know that 3 brand new machines are waiting in a lab somewhere in Redmond. Just saying...
We're ready! I think you're looped in on the last email. It'll be nice to get a few benchmarks in on the latest kernel to compare.
All 3 machines are on 22.04.2 now (5.15.0-67)
Just kicked off a run for the weekend. I'm back in the office on Tuesday in case something goes awry.
The first framework tested looks much worse than the two previous runs. Going to leave this open until I come back on Tuesday and we can verify these results. Feel free to comment in here if you see the same for other frameworks.
Can we have the CI updated too so guys can experiment on new kernel features with pr?
The relevant settings are here, I believe, so you can just open a PR.
The top results seem pretty much on par with the last few runs but the diff is kind of all over the place. With a lot of things changing, it's hard to do a 1:1 comparison, but I don't see any real issues. If anything comes up related to this, please feel free to open a new issue.
Conclusion: The new kernels don't fix the performance degradation, from security fixes. And with the last stroke(Retbleed mitigations) is worst.
@joanhey according to #8038 I can agree
When arrive the new servers ?? And what CPU they have ??
We are currently having an issue with the network cards that is being resolved with the provider (AFAIK the card is not supported in Ubuntu and might need to be replaced). CPU is https://ark.intel.com/content/www/us/en/ark/products/212458/intel-xeon-gold-6330-processor-42m-cache-2-00-ghz.html
@nbrady-techempower Perhaps it is time to remove the message displayed on top of the tfb-status page? It has been almost 2 months since the upgrade, and in fact the message is already outdated - now the kernel version is actually 5.15.0-70.
@nbrady-techempower hopefully we can get the new servers for Round 22 or 23? Excited to see the runs after the migration
I can resolve that in 2 days. But I am NOT Microsoft !!!!
22.04.3 LTS was released with 6.2 kernel support which could be interesting perf wise. There could be some perf bump for frameworks powered by io_uring.