Performance testing
placeholder issue for now, I will prepare detailed instructions to establish a baseline on your system and we can play around with optimisations later
Copy-paste from zig pr:
Test settings:
PHP: 8.4.10 Test System: RHEL 10. GCC 14.2.1, Zig 0.15-master (Clang 20.1.2), Clang 19.1.7. i7 13700, 16gb ram (WSL). CFLAGS: -fpic -fpie -O3 -march=x86-64-v3 Extensions: ./configure --disable-all --with-openssl --enable-opcache=shared --with-zlib --with-zip --with-bz2 --enable-dom --enable-simplexml --enable-gd --enable-posix --enable-pcntl --with-libxml --with-readline
Static and dynamic compilation don't make a difference in runtime performance. I ran shared and dynamic 5x and got results within 1% each direction at most. This was more or less expected - static compilation should allow for greater optimization in theory, but I suppose it's just not made proper use of with something this complex. All following tests are run against shared libraries because recompiling statically every time would take longer.
ZTS and NTS don't make a big difference. Talking about ~1-2%ish faster NTS on average. Highest difference between runs I saw was 4.5% in favour of NTS, but I also saw runs where ZTS was faster.
zig-cc (LLVM 20, native-native-gnu, so host glibc) and clang (19) make no difference in terms of runtime performance. zig nts with 116k vs clang nts with... 116k.
LTO made a negligible (~2%) difference in performance, but an insane difference in compile time (2x thin - 7x fat!). zig cc lto ZTS with 116k is on par with NTS. I raised an issue on php to make php <= 8.3 compatible with lto. The big issue here is that all our libraries also link their own programs with -flto unnecessarily, if we can only link php and extensions with lto it would probably only be a ~20% difference in total time.
Now to the unfortunate part... zig-cc or clang vs gcc (14.2) make a massive performance difference. Talking about 22-27% faster performance with gcc (136k zts, 137k nts). I haven't tested the old centos 7 (gcc 10) for performance yet. This is due to gcc global registers. When building with configure --disable-gcc-global-regs option, performance is slightly slower than Clangs.
Remi gcc performed far better than anything I compiled locally. Not sure why - maybe because of different extension sets? 160k compared to my local 137k with higher optimisation flags. It performs better with gcc -O2 locally too with 139k. I have not found the reason yet. When I recompile his RPM from source, I get the same 160k performance. But when I copy the CFLAGS and LDFLAGS and apply them in either a static or dynamic manual build, I'm back to ~137k.
Side note: GCC fails with LTO when global registers are used. But global registers have a ~25% speed increase, while LTO only offers ~2%. Clang does not support global registers yet. https://clang.llvm.org/docs/UsersManual.html#gcc-extensions-not-implemented-yet
Edit: Shivam Mathur's php zts is also missing these 25% (+ in this case). Edit 2: Official Appstream php NTS is as fast as remi's, so it must be related to the rpmbuild system somehow. Edit 3: Official ubuntu images NTS at 135k...
I'm not really getting any further, sadly. Here's a written down version of how to reproduce it:
RHEL 10 installation required.
Installation of required things:
subscription-manager repos --enable codeready-builder-for-rhel-10-x86_64-rpms
dnf install epel-release
dnf install https://rpms.remirepo.net/enterprise/remi-release-10.rpm
sudo dnf module enable php:remi-8.4
sudo dnf install php
sudo dnf install phoronix-test-suite
Testing:
PHP_BIN=/usr/bin/php phoronix-test-suite benchmark phpbench
# compile php with static-php-cli
PHP_BIN=~/static-php-cli/buildroot/bin/php phoronix-test-suite benchmark phpbench
Self-compiling rpms:
sudo dnf groupinstall -y "Development Tools"
sudo dnf install -y dnf-utils rpm-build rpmdevtools yum-utils gcc gcc-c++ libxml2-devel openssl-devel bzip2-devel libicu-devel
rpmdev-setuptree
git clone https://git.remirepo.net/rpms/php/php84.git
mv php84/php84.spec ~/rpmbuild/SPECS/
mv php84/* ~/rpmbuild/SOURCES/
wget https://www.php.net/distributions/php-8.4.11.tar.xz
wget https://www.php.net/distributions/php-8.4.11.tar.xz.asc
mv php-8.4.11.tar.xz php-8.4.11.tar.xz.asc ~/rpmbuild/SOURCES/
cd ~/rpmbuild/SPECS
sudo dnf builddep php84.spec
rpmbuild -ba php84.spec
You can install them with sudo dnf install ~/rpmbuild/RPMS/x86_64/php*.rpm.
I'm finally making some progress. Found a few different ./configure settings that have a small impact on performance and... this one should have been obvious... realized that remi/system php on RHEL is the only one to enable opcache for cli...
PHP_BIN=/home/m/static-php-cli/buildroot/bin/php phoronix-test-suite benchmark phpbench
->
PHP Benchmark Suite:
1712465
1601557
1685342
1608505
1697826
1618018
1715997
1363717
1687988
1609318
1647410
1582247
Average: 1627533 Score
Deviation: 5.87%
Samples: 12
I will think about ways to integrate these into spc after zig is merged. I will have to redo zig tests in case this brings them reasonably close without global registers, but with PGO.
We're cooking
PHP Benchmark Suite:
1816379
1799310
1797994
Average: 1804561 Score
Deviation: 0.57%
https://github.com/crazywhalecc/static-php-cli/pull/861#issuecomment-3213081930
Keeping in mind that I only have a docker container (x86_64) to test macos...
homebrew installed php: 90k points static-php-cli clang -Os: 85k points static-php-cli clang -O3: 93k points static-php-cli gcc-15 -Os: 103k points static-php-cli gcc-15 -O3: 105k points static-php-cli gcc-15 -O3 opcache: 130k points
Looks like macos is generally slower than linux for me, but that could be explained by being a docker container instead of a VM. But, just like in Linux, gcc-15 produces much faster code than clang due to global register variables. Could you test this on aarch64?
Well, good news for aarch64 users - clang and gcc appear to produce identical benchmarks. The bad news is that it's much slower compared to x86_64, which makes sense when the global registers optimization is only for x86_64.
Well, crap, I might have to retract my last statement. I was testing with the -Os default, which performed identical at 43k points between zig and gcc. Recompiling with -O3 instead, I get:
gcc-11 -O3 w/o opcache: 53k gcc-11 -O3 w/ opcache: 64k zig 16 -O3 w/o opcache: 43k (zig 16 -O3 w/ opcache: 51k
So yeah... that's huge if it's the case on macOS too, which I suspect. The source only checks for GNU libc once, but global registers will still be used without _GNU (source). So that's huge, switching to a php version compiled by gcc instead of clang might provide ~15-20% performance uplift @dunglas.
Damn, gcc-14 is consistently faster than gcc-11 on top: 67k for O3. -Ofast is even faster at 70k, but that might not be worth it because of standard compliance.
@dunglas Shivam Mathur would understandably prefer to stay in line with the official brew tap and the official brew tap closed my issue (which was admittedly misplaced as a bug report, for lack of a better place to raise awareness).
I don't use macOS so it doesn't impact me, but maybe you're interested in pursuing it further.
@henderkes I think you can reopen on the official homebrew repo as long as you copy the output of the commands they ask.
It seems that you can force Homebrew to compile with GCC with:
brew install --build-from-source --cc=gcc php
Jn a twisted turn if events, the new (php 8.5+) tail call VM is faster than the hybrid VM that we get with gcc. Should we strive to make Clang the default for 8.5, or keep it consistent with php <= 8.4?
Out of curiosity, I just ran the tests on my MacBook Air M2 (2022) (8 cores, battery mode). PHP is the current version from brew: PHP 8.4.14 (cli) (built: Oct 21 2025 19:23:55) (NTS). The result with opcache is: Average: 2,187,914 Score
Homebrew was updated to use gcc, you're getting the benefit of global register variables with php < 8.5.
Thanks, that is great! And the numers are impressive (on such an old notebook) in comparision to some windows servers I have access to.
Php on windows is scuffed, but linux performs even better than macOS (by a slim amount now). M2 is only three years old, my production 7950x3d hits around 1.9 million as well (keep in mind phoronix measures single core performance).
Yea, I know. There would be the possibibility to use Linux on Windows to improve speed a bit, but its not my decision - otherwise I would not develop on macos and push finished product on a Windows-Server :)