wolfssl
wolfssl copied to clipboard
AES RISC-V 64-bit ASM: ECB/CBC/CTR/GCM/CCM
Description
Add implementations of AES for ECB/CBC/CTR/GCM/CCM for RISC-V using assembly. Assembly with standard/scalar cryptography/vector cryptographt instructions.
Testing
./configure --enable-all --enable-riscv-asm
Checklist
- [x] added tests
- [ ] updated/added doxygen
- [ ] updated appropriate READMEs
- [ ] Updated manual and documentation
retest this please
@SparkiDev I've got a few RISC-V targets here now, so I will try this on actual HW.
HiFive Unleashed at 1.4GHz The new asm is like 50 times faster
./configure --enable-riscv-asm && make
root@HiFiveU:~/wolfssl-riscv# ./wolfcrypt/benchmark/benchmark -aes-cbc -aes-gcm------------------------------------------------------------------------------
wolfSSL version 5.7.0
------------------------------------------------------------------------------
Math: Multi-Precision: Wolf(SP) word-size=64 bits=3072 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
AES-128-CBC-enc 20 MiB took 1.076 seconds, 18.588 MiB/s
AES-128-CBC-dec 20 MiB took 1.083 seconds, 18.473 MiB/s
AES-192-CBC-enc 20 MiB took 1.245 seconds, 16.062 MiB/s
AES-192-CBC-dec 20 MiB took 1.246 seconds, 16.047 MiB/s
AES-256-CBC-enc 15 MiB took 1.057 seconds, 14.189 MiB/s
AES-256-CBC-dec 15 MiB took 1.055 seconds, 14.212 MiB/s
AES-128-GCM-enc 15 MiB took 1.300 seconds, 11.543 MiB/s
AES-128-GCM-dec 15 MiB took 1.300 seconds, 11.535 MiB/s
AES-192-GCM-enc 15 MiB took 1.425 seconds, 10.526 MiB/s
AES-192-GCM-dec 15 MiB took 1.425 seconds, 10.523 MiB/s
AES-256-GCM-enc 10 MiB took 1.032 seconds, 9.687 MiB/s
AES-256-GCM-dec 10 MiB took 1.032 seconds, 9.691 MiB/s
GMAC Table 4-bit 31 MiB took 1.025 seconds, 30.251 MiB/s
Benchmark complete
On master
./configure —enable-all && make
root@HiFiveU:~/wolfssl# ./wolfcrypt/benchmark/benchmark -aes-cbc -aes-gcm
------------------------------------------------------------------------------
wolfSSL version 5.7.0
------------------------------------------------------------------------------
Math: Multi-Precision: Wolf(SP) word-size=64 bits=4096 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
AES-128-CBC-enc 5 MiB took 12.798 seconds, 0.391 MiB/s
AES-128-CBC-dec 5 MiB took 12.672 seconds, 0.395 MiB/s
AES-192-CBC-enc 5 MiB took 15.301 seconds, 0.327 MiB/s
AES-192-CBC-dec 5 MiB took 15.181 seconds, 0.329 MiB/s
AES-256-CBC-enc 5 MiB took 17.820 seconds, 0.281 MiB/s
AES-256-CBC-dec 5 MiB took 17.669 seconds, 0.283 MiB/s
AES-128-GCM-enc 5 MiB took 12.870 seconds, 0.388 MiB/s
AES-128-GCM-dec 5 MiB took 12.870 seconds, 0.388 MiB/s
AES-192-GCM-enc 5 MiB took 15.375 seconds, 0.325 MiB/s
AES-192-GCM-dec 5 MiB took 15.376 seconds, 0.325 MiB/s
AES-256-GCM-enc 5 MiB took 17.878 seconds, 0.280 MiB/s
AES-256-GCM-dec 5 MiB took 17.896 seconds, 0.279 MiB/s
AES-128-GCM-STREAM-enc 5 MiB took 12.878 seconds, 0.388 MiB/s
AES-128-GCM-STREAM-dec 5 MiB took 12.878 seconds, 0.388 MiB/s
AES-192-GCM-STREAM-enc 5 MiB took 15.379 seconds, 0.325 MiB/s
AES-192-GCM-STREAM-dec 5 MiB took 15.385 seconds, 0.325 MiB/s
AES-256-GCM-STREAM-enc 5 MiB took 17.881 seconds, 0.280 MiB/s
AES-256-GCM-STREAM-dec 5 MiB took 17.888 seconds, 0.280 MiB/s
GMAC Table 4-bit 30 MiB took 1.006 seconds, 29.831 MiB/s
Benchmark complete
./configure --enable-all --enable-riscv-asm
wolfcrypt/src/aes.c: In function '_AesXtsHelper': wolfcrypt/src/aes.c:12631:16: error: implicit declaration of function '_AesEcbEncrypt'; did you mean 'wc_AesEcbEncrypt'? [-Werror=implicit-function-declaration] 12631 | return _AesEcbEncrypt(aes, out, out, totalSz); | ^~~~~~~~~~~~~~ | wc_AesEcbEncrypt wolfcrypt/src/aes.c:12631:16: error: nested extern declaration of '_AesEcbEncrypt' [-Werror=nested-externs] wolfcrypt/src/aes.c:12634:16: error: implicit declaration of function '_AesEcbDecrypt'; did you mean 'wc_AesEcbDecrypt'? [-Werror=implicit-function-declaration] 12634 | return _AesEcbDecrypt(aes, out, out, totalSz); | ^~~~~~~~~~~~~~ | wc_AesEcbDecrypt wolfcrypt/src/aes.c:12634:16: error: nested extern declaration of '_AesEcbDecrypt' [-Werror=nested-externs]
@SparkiDev says AES XTS is not yet support with RISC-V ASM. Note: I tried to use ./configure --enable-all --disable-aesxtx --enable-riscv-asm but that didn't work. We normally support a way to disable a specific option with all. Sean please review.
@SparkiDev is this RISC-V ASM PR ready for merge? I can’t tell if you are planning to push anything else to it.
Fixed --enable-all to work.
retest this please
Updated benchmarks:
HiFive Unleashed at 1.4GHz
./configure --enable-all --enable-riscv-asm
make
root@HiFiveU:~/wolfssl# ./wolfcrypt/benchmark/benchmark
------------------------------------------------------------------------------
wolfSSL version 5.7.0
------------------------------------------------------------------------------
Math: Multi-Precision: Wolf(SP) word-size=64 bits=4096 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG 10 MiB took 1.488 seconds, 6.721 MiB/s
AES-128-CBC-enc 20 MiB took 1.139 seconds, 17.554 MiB/s
AES-128-CBC-dec 20 MiB took 1.145 seconds, 17.470 MiB/s
AES-192-CBC-enc 20 MiB took 1.321 seconds, 15.144 MiB/s
AES-192-CBC-dec 20 MiB took 1.321 seconds, 15.139 MiB/s
AES-256-CBC-enc 15 MiB took 1.115 seconds, 13.450 MiB/s
AES-256-CBC-dec 15 MiB took 1.123 seconds, 13.361 MiB/s
AES-128-GCM-enc 15 MiB took 1.395 seconds, 10.750 MiB/s
AES-128-GCM-dec 15 MiB took 1.372 seconds, 10.933 MiB/s
AES-192-GCM-enc 10 MiB took 1.007 seconds, 9.930 MiB/s
AES-192-GCM-dec 10 MiB took 1.006 seconds, 9.940 MiB/s
AES-256-GCM-enc 10 MiB took 1.088 seconds, 9.188 MiB/s
AES-256-GCM-dec 10 MiB took 1.088 seconds, 9.192 MiB/s
GMAC Table 4-bit 31 MiB took 1.029 seconds, 30.136 MiB/s
AES-128-ECB-enc 22 MiB took 1.218 seconds, 18.063 MiB/s
AES-128-ECB-dec 22 MiB took 1.209 seconds, 18.191 MiB/s
AES-192-ECB-enc 22 MiB took 1.414 seconds, 15.556 MiB/s
AES-192-ECB-dec 22 MiB took 1.406 seconds, 15.644 MiB/s
AES-256-ECB-enc 22 MiB took 1.601 seconds, 13.740 MiB/s
AES-256-ECB-dec 22 MiB took 1.608 seconds, 13.677 MiB/s
AES-XTS-enc 15 MiB took 1.193 seconds, 12.569 MiB/s
AES-XTS-dec 15 MiB took 1.190 seconds, 12.608 MiB/s
AES-128-CFB 20 MiB took 1.319 seconds, 15.167 MiB/s
AES-192-CFB 15 MiB took 1.115 seconds, 13.447 MiB/s
AES-256-CFB 15 MiB took 1.240 seconds, 12.092 MiB/s
AES-128-OFB 20 MiB took 1.316 seconds, 15.202 MiB/s
AES-192-OFB 15 MiB took 1.114 seconds, 13.461 MiB/s
AES-256-OFB 15 MiB took 1.240 seconds, 12.094 MiB/s
AES-128-CTR 20 MiB took 1.134 seconds, 17.639 MiB/s
AES-192-CTR 20 MiB took 1.317 seconds, 15.181 MiB/s
AES-256-CTR 15 MiB took 1.109 seconds, 13.526 MiB/s
AES-CCM-enc 10 MiB took 1.087 seconds, 9.202 MiB/s
AES-CCM-dec 10 MiB took 1.088 seconds, 9.194 MiB/s
AES-256-SIV-enc 10 MiB took 1.151 seconds, 8.686 MiB/s
AES-256-SIV-dec 10 MiB took 1.149 seconds, 8.704 MiB/s
AES-384-SIV-enc 10 MiB took 1.330 seconds, 7.521 MiB/s
AES-384-SIV-dec 10 MiB took 1.329 seconds, 7.526 MiB/s
AES-512-SIV-enc 10 MiB took 1.497 seconds, 6.681 MiB/s
AES-512-SIV-dec 10 MiB took 1.496 seconds, 6.683 MiB/s
Camellia 15 MiB took 1.297 seconds, 11.563 MiB/s
ARC4 30 MiB took 1.121 seconds, 26.756 MiB/s
CHACHA 30 MiB took 1.016 seconds, 29.525 MiB/s
CHA-POLY 25 MiB took 1.140 seconds, 21.934 MiB/s
3DES 5 MiB took 1.632 seconds, 3.064 MiB/s
MD5 75 MiB took 1.050 seconds, 71.403 MiB/s
POLY1305 90 MiB took 1.053 seconds, 85.442 MiB/s
SHA 35 MiB took 1.101 seconds, 31.787 MiB/s
SHA-224 20 MiB took 1.112 seconds, 17.980 MiB/s
SHA-256 20 MiB took 1.114 seconds, 17.952 MiB/s
SHA-384 15 MiB took 1.359 seconds, 11.038 MiB/s
SHA-512 15 MiB took 1.315 seconds, 11.406 MiB/s
SHA-512/224 15 MiB took 1.461 seconds, 10.269 MiB/s
SHA-512/256 15 MiB took 1.461 seconds, 10.266 MiB/s
SHA3-224 20 MiB took 1.187 seconds, 16.849 MiB/s
SHA3-256 20 MiB took 1.250 seconds, 15.998 MiB/s
SHA3-384 15 MiB took 1.197 seconds, 12.532 MiB/s
SHA3-512 10 MiB took 1.140 seconds, 8.770 MiB/s
SHAKE128 20 MiB took 1.034 seconds, 19.339 MiB/s
SHAKE256 20 MiB took 1.250 seconds, 16.002 MiB/s
RIPEMD 20 MiB took 1.071 seconds, 18.679 MiB/s
BLAKE2b 30 MiB took 1.155 seconds, 25.973 MiB/s
BLAKE2s 20 MiB took 1.202 seconds, 16.637 MiB/s
AES-128-CMAC 20 MiB took 1.166 seconds, 17.149 MiB/s
AES-256-CMAC 15 MiB took 1.136 seconds, 13.200 MiB/s
HMAC-MD5 75 MiB took 1.050 seconds, 71.403 MiB/s
HMAC-SHA 35 MiB took 1.099 seconds, 31.834 MiB/s
HMAC-SHA224 20 MiB took 1.115 seconds, 17.931 MiB/s
HMAC-SHA256 20 MiB took 1.116 seconds, 17.921 MiB/s
HMAC-SHA384 20 MiB took 1.134 seconds, 17.640 MiB/s
HMAC-SHA512 20 MiB took 1.182 seconds, 16.917 MiB/s
PBKDF2 2 KiB took 1.011 seconds, 2.195 KiB/s
SipHash-8 130 MiB took 1.018 seconds, 127.690 MiB/s
SipHash-16 130 MiB took 1.018 seconds, 127.697 MiB/s
KDF 128 SRTP 205045 ops took 1.000 sec, avg 0.005 ms, 205041.431 ops/sec
KDF 256 SRTP 140095 ops took 1.000 sec, avg 0.007 ms, 140092.996 ops/sec
KDF 128 SRTCP 204845 ops took 1.000 sec, avg 0.005 ms, 204843.486 ops/sec
KDF 256 SRTCP 139070 ops took 1.000 sec, avg 0.007 ms, 139067.480 ops/sec
scrypt 17 10 ops took 5.608 sec, avg 560.843 ms, 1.783 ops/sec
RSA 1024 key gen 6 ops took 1.163 sec, avg 193.831 ms, 5.159 ops/sec
RSA 2048 key gen 1 ops took 2.187 sec, avg 2186.849 ms, 0.457 ops/sec
RSA 2048 public 1400 ops took 1.065 sec, avg 0.761 ms, 1314.340 ops/sec
RSA 2048 private 100 ops took 3.932 sec, avg 39.325 ms, 25.429 ops/sec
DH 2048 key gen 109 ops took 1.007 sec, avg 9.242 ms, 108.205 ops/sec
DH 2048 agree 100 ops took 1.953 sec, avg 19.530 ms, 51.202 ops/sec
ECC [ SECP256R1] 256 key gen 1000 ops took 1.065 sec, avg 1.065 ms, 939.342 ops/sec
ECDHE [ SECP256R1] 256 agree 1000 ops took 1.014 sec, avg 1.014 ms, 985.994 ops/sec
ECDSA [ SECP256R1] 256 sign 900 ops took 1.112 sec, avg 1.236 ms, 809.309 ops/sec
ECDSA [ SECP256R1] 256 verify 700 ops took 1.030 sec, avg 1.472 ms, 679.428 ops/sec
ECC [ SECP256R1] 256 encrypt 900 ops took 1.051 sec, avg 1.168 ms, 856.368 ops/sec
ECC [ SECP256R1] 256 decrypt 800 ops took 1.106 sec, avg 1.382 ms, 723.377 ops/sec
ECC [BRAINPOOLP256R1] 256 key gen 900 ops took 1.080 sec, avg 1.200 ms, 833.102 ops/sec
ECDHE [BRAINPOOLP256R1] 256 agree 900 ops took 1.034 sec, avg 1.149 ms, 870.528 ops/sec
ECDSA [BRAINPOOLP256R1] 256 sign 800 ops took 1.093 sec, avg 1.366 ms, 731.855 ops/sec
ECDSA [BRAINPOOLP256R1] 256 verify 700 ops took 1.088 sec, avg 1.554 ms, 643.652 ops/sec
ECC [BRAINPOOLP256R1] 256 encrypt 800 ops took 1.044 sec, avg 1.305 ms, 766.018 ops/sec
ECC [BRAINPOOLP256R1] 256 decrypt 700 ops took 1.101 sec, avg 1.574 ms, 635.508 ops/sec
CURVE 25519 key gen 1154 ops took 1.000 sec, avg 0.867 ms, 1153.836 ops/sec
CURVE 25519 agree 1200 ops took 1.013 sec, avg 0.844 ms, 1184.526 ops/sec
ED 25519 key gen 2273 ops took 1.000 sec, avg 0.440 ms, 2272.384 ops/sec
ED 25519 sign 2100 ops took 1.032 sec, avg 0.491 ms, 2035.573 ops/sec
ED 25519 verify 1000 ops took 1.035 sec, avg 1.035 ms, 966.428 ops/sec
CURVE 448 key gen 373 ops took 1.002 sec, avg 2.685 ms, 372.413 ops/sec
CURVE 448 agree 400 ops took 1.063 sec, avg 2.659 ms, 376.125 ops/sec
ED 448 key gen 746 ops took 1.000 sec, avg 1.341 ms, 745.990 ops/sec
ED 448 sign 800 ops took 1.121 sec, avg 1.401 ms, 713.916 ops/sec
ED 448 verify 400 ops took 1.316 sec, avg 3.289 ms, 303.998 ops/sec
ECCSI 256 key gen 774 ops took 1.001 sec, avg 1.293 ms, 773.568 ops/sec
ECCSI 256 pair gen 937 ops took 1.000 sec, avg 1.067 ms, 936.782 ops/sec
ECCSI 256 valid 582 ops took 1.000 sec, avg 1.719 ms, 581.719 ops/sec
ECCSI 256 sign 854 ops took 1.001 sec, avg 1.172 ms, 853.512 ops/sec
ECCSI 256 verify 247 ops took 1.000 sec, avg 4.049 ms, 246.951 ops/sec
SAKKE 1024 key gen 15 ops took 1.008 sec, avg 67.193 ms, 14.882 ops/sec
SAKKE 1024 rsk gen 39 ops took 1.017 sec, avg 26.083 ms, 38.339 ops/sec
SAKKE 1024 valid 4 ops took 1.105 sec, avg 276.192 ms, 3.621 ops/sec
SAKKE 1024 encap-1 6 ops took 1.037 sec, avg 172.764 ms, 5.788 ops/sec
SAKKE 1024 derive-1 4 ops took 1.189 sec, avg 297.185 ms, 3.365 ops/sec
SAKKE 1024 encap-2 6 ops took 1.032 sec, avg 172.008 ms, 5.814 ops/sec
SAKKE 1024 derive-2 4 ops took 1.188 sec, avg 296.979 ms, 3.367 ops/sec
SAKKE 1024 derive-3 4 ops took 1.187 sec, avg 296.857 ms, 3.369 ops/sec
SAKKE 1024 derive-4 4 ops took 1.188 sec, avg 296.881 ms, 3.368 ops/sec
Benchmark complete
@dgarske - question on the benchmarks fix data size vs fixed time:
In the master and riscv_aes_asm branch you ran these commands, respectively:
# before, on master
./configure —enable-all
vs
# after, with ASM Optimization
./configure --enable-all --enable-riscv-asm
Then for comparison your ran this for both:
./wolfcrypt/benchmark/benchmark -aes-cbc -aes-gcm
The output on master took a fixed 5MB chunk of data and timed the completion: in this example 12.798 seconds:
AES-128-CBC-enc 5 MiB took 12.798 seconds, 0.391 MiB/s
The output on riscv_aes_asm completed as soon as reasonable after a fixed one second duration and determined the amount of data processed:
AES-128-CBC-enc 20 MiB took 1.076 seconds, 18.588 MiB/s
Why the difference in fixed data size vs fixed time?
Additionally, perhaps just nit-picky, but curious: it appears there was also difference in bits size generated. bits=4096:
# master
Math: Multi-Precision: Wolf(SP) word-size=64 bits=4096 sp_int.c
vs bits=3072
# ASM
Math: Multi-Precision: Wolf(SP) word-size=64 bits=3072 sp_int.c
It appears that ./configure --enable-all --enable-riscv-asm produces different user_settings.h than ./configure —enable-all affecting more than just assembly optimization. Perhaps it should be consistent, at least for the benchmark configuration? I'm also left wondering for a real apples-to-apples if master was set to bits=3072 whether there would be a performance difference?
In any case - that's an astonishing performance boost by @SparkiDev :)
The size of data processed is the number of 1048576 byte (=1MB) buffers encrypted/decrypted. We do a minimum number of buffers regardless of platform but no less than for 1 second.
I have no idea why the number of bits in SP changed though.