openssl icon indicating copy to clipboard operation
openssl copied to clipboard

Speed up AES-256-GCM on aarch64 to (at least) armv4 level

Open rfjakob opened this issue 3 years ago • 8 comments

I have a Raspberry Pi 4, and I am benchmarking AES-256-GCM on 32 vs 64-bit-arm Debian Bullseye.

OpenSSL 3.0.0 was compiled from source, config dumps:

  • arm 64 bit (asm_arch => "aarch64"): https://gist.github.com/rfjakob/82fb1ca5e1f6f7756b7a4b9dc2ca4783
  • arm 32 bit (asm_arch => "armv4"): https://gist.github.com/rfjakob/bb999b293201ff257349672e5aa9aeba

Interestingly, the armv4 version is 2.3 times faster than the aarch64 version.

arm 32 bit:

root@f13b37d6334c:~/openssl-3.0.0# LD_LIBRARY_PATH=$PWD ./apps/openssl speed -evp aes-256-gcm
[...]
version: 3.0.0
built on: built on: Thu Sep  9 14:48:50 2021 UTC
options:bn(64,32) 
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_armcap=0x3
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
AES-256-GCM      42906.57k    48964.74k    50933.59k    54975.15k    56318.63k    55885.82k

arm 64 bit:

root@b135577e8e14:~/openssl-3.0.0# LD_LIBRARY_PATH=$PWD ./apps/openssl speed -evp aes-256-gcm
[...]
version: 3.0.0
built on: built on: Thu Sep  9 15:45:52 2021 UTC
options:bn(64,64) 
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_armcap=0x83
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
AES-256-GCM      22017.32k    23275.02k    23746.56k    23875.93k    23953.41k    24033.52k

rfjakob avatar Sep 09 '21 16:09 rfjakob