openssl
openssl copied to clipboard
Speed up AES-256-GCM on aarch64 to (at least) armv4 level
I have a Raspberry Pi 4, and I am benchmarking AES-256-GCM on 32 vs 64-bit-arm Debian Bullseye.
OpenSSL 3.0.0 was compiled from source, config dumps:
- arm 64 bit (
asm_arch => "aarch64"
): https://gist.github.com/rfjakob/82fb1ca5e1f6f7756b7a4b9dc2ca4783 - arm 32 bit (
asm_arch => "armv4"
): https://gist.github.com/rfjakob/bb999b293201ff257349672e5aa9aeba
Interestingly, the armv4 version is 2.3 times faster than the aarch64 version.
arm 32 bit:
root@f13b37d6334c:~/openssl-3.0.0# LD_LIBRARY_PATH=$PWD ./apps/openssl speed -evp aes-256-gcm
[...]
version: 3.0.0
built on: built on: Thu Sep 9 14:48:50 2021 UTC
options:bn(64,32)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_armcap=0x3
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-256-GCM 42906.57k 48964.74k 50933.59k 54975.15k 56318.63k 55885.82k
arm 64 bit:
root@b135577e8e14:~/openssl-3.0.0# LD_LIBRARY_PATH=$PWD ./apps/openssl speed -evp aes-256-gcm
[...]
version: 3.0.0
built on: built on: Thu Sep 9 15:45:52 2021 UTC
options:bn(64,64)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_armcap=0x83
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-256-GCM 22017.32k 23275.02k 23746.56k 23875.93k 23953.41k 24033.52k