static-php-cli icon indicating copy to clipboard operation
static-php-cli copied to clipboard

Performance testing

Open henderkes opened this issue 5 months ago • 17 comments

placeholder issue for now, I will prepare detailed instructions to establish a baseline on your system and we can play around with optimisations later

Copy-paste from zig pr:

Test settings:

PHP: 8.4.10 Test System: RHEL 10. GCC 14.2.1, Zig 0.15-master (Clang 20.1.2), Clang 19.1.7. i7 13700, 16gb ram (WSL). CFLAGS: -fpic -fpie -O3 -march=x86-64-v3 Extensions: ./configure --disable-all --with-openssl --enable-opcache=shared --with-zlib --with-zip --with-bz2 --enable-dom --enable-simplexml --enable-gd --enable-posix --enable-pcntl --with-libxml --with-readline

Static and dynamic compilation don't make a difference in runtime performance. I ran shared and dynamic 5x and got results within 1% each direction at most. This was more or less expected - static compilation should allow for greater optimization in theory, but I suppose it's just not made proper use of with something this complex. All following tests are run against shared libraries because recompiling statically every time would take longer.

ZTS and NTS don't make a big difference. Talking about ~1-2%ish faster NTS on average. Highest difference between runs I saw was 4.5% in favour of NTS, but I also saw runs where ZTS was faster.

zig-cc (LLVM 20, native-native-gnu, so host glibc) and clang (19) make no difference in terms of runtime performance. zig nts with 116k vs clang nts with... 116k.

LTO made a negligible (~2%) difference in performance, but an insane difference in compile time (2x thin - 7x fat!). zig cc lto ZTS with 116k is on par with NTS. I raised an issue on php to make php <= 8.3 compatible with lto. The big issue here is that all our libraries also link their own programs with -flto unnecessarily, if we can only link php and extensions with lto it would probably only be a ~20% difference in total time.

Now to the unfortunate part... zig-cc or clang vs gcc (14.2) make a massive performance difference. Talking about 22-27% faster performance with gcc (136k zts, 137k nts). I haven't tested the old centos 7 (gcc 10) for performance yet. This is due to gcc global registers. When building with configure --disable-gcc-global-regs option, performance is slightly slower than Clangs.

Remi gcc performed far better than anything I compiled locally. Not sure why - maybe because of different extension sets? 160k compared to my local 137k with higher optimisation flags. It performs better with gcc -O2 locally too with 139k. I have not found the reason yet. When I recompile his RPM from source, I get the same 160k performance. But when I copy the CFLAGS and LDFLAGS and apply them in either a static or dynamic manual build, I'm back to ~137k.

Side note: GCC fails with LTO when global registers are used. But global registers have a ~25% speed increase, while LTO only offers ~2%. Clang does not support global registers yet. https://clang.llvm.org/docs/UsersManual.html#gcc-extensions-not-implemented-yet

Edit: Shivam Mathur's php zts is also missing these 25% (+ in this case). Edit 2: Official Appstream php NTS is as fast as remi's, so it must be related to the rpmbuild system somehow. Edit 3: Official ubuntu images NTS at 135k...

henderkes avatar Jul 29 '25 10:07 henderkes

I'm not really getting any further, sadly. Here's a written down version of how to reproduce it:

RHEL 10 installation required.

Installation of required things:

subscription-manager repos --enable codeready-builder-for-rhel-10-x86_64-rpms
dnf install epel-release
dnf install https://rpms.remirepo.net/enterprise/remi-release-10.rpm
sudo dnf module enable php:remi-8.4
sudo dnf install php
sudo dnf install phoronix-test-suite

Testing:

PHP_BIN=/usr/bin/php phoronix-test-suite benchmark phpbench
# compile php with static-php-cli
PHP_BIN=~/static-php-cli/buildroot/bin/php phoronix-test-suite benchmark phpbench

Self-compiling rpms:

sudo dnf groupinstall -y "Development Tools"
sudo dnf install -y dnf-utils rpm-build rpmdevtools yum-utils gcc gcc-c++ libxml2-devel openssl-devel bzip2-devel libicu-devel
rpmdev-setuptree
git clone https://git.remirepo.net/rpms/php/php84.git
mv php84/php84.spec ~/rpmbuild/SPECS/
mv php84/* ~/rpmbuild/SOURCES/
wget https://www.php.net/distributions/php-8.4.11.tar.xz
wget https://www.php.net/distributions/php-8.4.11.tar.xz.asc
mv php-8.4.11.tar.xz php-8.4.11.tar.xz.asc ~/rpmbuild/SOURCES/
cd ~/rpmbuild/SPECS
sudo dnf builddep php84.spec
rpmbuild -ba php84.spec

You can install them with sudo dnf install ~/rpmbuild/RPMS/x86_64/php*.rpm.

henderkes avatar Jul 30 '25 05:07 henderkes

I'm finally making some progress. Found a few different ./configure settings that have a small impact on performance and... this one should have been obvious... realized that remi/system php on RHEL is the only one to enable opcache for cli...

PHP_BIN=/home/m/static-php-cli/buildroot/bin/php phoronix-test-suite benchmark phpbench
->
    PHP Benchmark Suite:
        1712465
        1601557
        1685342
        1608505
        1697826
        1618018
        1715997
        1363717
        1687988
        1609318
        1647410
        1582247

    Average: 1627533 Score
    Deviation: 5.87%
    Samples: 12

I will think about ways to integrate these into spc after zig is merged. I will have to redo zig tests in case this brings them reasonably close without global registers, but with PGO.

henderkes avatar Jul 30 '25 15:07 henderkes

We're cooking

    PHP Benchmark Suite:
        1816379
        1799310
        1797994

    Average: 1804561 Score
    Deviation: 0.57%

henderkes avatar Jul 31 '25 05:07 henderkes

https://github.com/crazywhalecc/static-php-cli/pull/861#issuecomment-3213081930

crazywhalecc avatar Aug 22 '25 06:08 crazywhalecc

Keeping in mind that I only have a docker container (x86_64) to test macos...

homebrew installed php: 90k points static-php-cli clang -Os: 85k points static-php-cli clang -O3: 93k points static-php-cli gcc-15 -Os: 103k points static-php-cli gcc-15 -O3: 105k points static-php-cli gcc-15 -O3 opcache: 130k points

Looks like macos is generally slower than linux for me, but that could be explained by being a docker container instead of a VM. But, just like in Linux, gcc-15 produces much faster code than clang due to global register variables. Could you test this on aarch64?

henderkes avatar Aug 30 '25 08:08 henderkes

Well, good news for aarch64 users - clang and gcc appear to produce identical benchmarks. The bad news is that it's much slower compared to x86_64, which makes sense when the global registers optimization is only for x86_64.

henderkes avatar Aug 31 '25 13:08 henderkes

Well, crap, I might have to retract my last statement. I was testing with the -Os default, which performed identical at 43k points between zig and gcc. Recompiling with -O3 instead, I get:

gcc-11 -O3 w/o opcache: 53k gcc-11 -O3 w/ opcache: 64k zig 16 -O3 w/o opcache: 43k (zig 16 -O3 w/ opcache: 51k

So yeah... that's huge if it's the case on macOS too, which I suspect. The source only checks for GNU libc once, but global registers will still be used without _GNU (source). So that's huge, switching to a php version compiled by gcc instead of clang might provide ~15-20% performance uplift @dunglas.

henderkes avatar Aug 31 '25 16:08 henderkes

Damn, gcc-14 is consistently faster than gcc-11 on top: 67k for O3. -Ofast is even faster at 70k, but that might not be worth it because of standard compliance.

henderkes avatar Aug 31 '25 16:08 henderkes

@dunglas Shivam Mathur would understandably prefer to stay in line with the official brew tap and the official brew tap closed my issue (which was admittedly misplaced as a bug report, for lack of a better place to raise awareness).

I don't use macOS so it doesn't impact me, but maybe you're interested in pursuing it further.

henderkes avatar Sep 02 '25 07:09 henderkes

@henderkes I think you can reopen on the official homebrew repo as long as you copy the output of the commands they ask.

dunglas avatar Sep 02 '25 13:09 dunglas

It seems that you can force Homebrew to compile with GCC with:

brew install --build-from-source --cc=gcc php

dunglas avatar Sep 02 '25 13:09 dunglas

Jn a twisted turn if events, the new (php 8.5+) tail call VM is faster than the hybrid VM that we get with gcc. Should we strive to make Clang the default for 8.5, or keep it consistent with php <= 8.4?

henderkes avatar Oct 13 '25 10:10 henderkes

Out of curiosity, I just ran the tests on my MacBook Air M2 (2022) (8 cores, battery mode). PHP is the current version from brew: PHP 8.4.14 (cli) (built: Oct 21 2025 19:23:55) (NTS). The result with opcache is: Average: 2,187,914 Score

MacGritsch avatar Nov 18 '25 17:11 MacGritsch

Homebrew was updated to use gcc, you're getting the benefit of global register variables with php < 8.5.

henderkes avatar Nov 18 '25 17:11 henderkes

Thanks, that is great! And the numers are impressive (on such an old notebook) in comparision to some windows servers I have access to.

MacGritsch avatar Nov 18 '25 18:11 MacGritsch

Php on windows is scuffed, but linux performs even better than macOS (by a slim amount now). M2 is only three years old, my production 7950x3d hits around 1.9 million as well (keep in mind phoronix measures single core performance).

henderkes avatar Nov 18 '25 18:11 henderkes

Yea, I know. There would be the possibibility to use Linux on Windows to improve speed a bit, but its not my decision - otherwise I would not develop on macos and push finished product on a Windows-Server :)

MacGritsch avatar Nov 18 '25 18:11 MacGritsch