asuswrt-merlin.ng icon indicating copy to clipboard operation
asuswrt-merlin.ng copied to clipboard

arm64 kernels: add accelerated crc32 routines

Open kjbracey2 opened this issue 3 years ago • 4 comments

Incorporate changes from Linux 4.20/4.21 to accelerate the kernel's crc32_le and __crc32c_le helpers.

Incorporates:

9784d82db ("make core crc32() routines weak so they can be overridden") 7481cddf2 ("arm64/lib: add accelerated crc32 routines") efdb25efc ("arm64/lib: improve CRC32 performance for deep pipelines") ff98e20ef ("lib/crc32.c: mark c4c32_le_base/__crc32_le_base alias as __pure")

But omits the runtime selection which uses machinery that differs significantly in Linux 4.1. We assume CRC support is always available.

kjbracey2 avatar Jan 12 '22 11:01 kjbracey2

I'm also preparing a patch to accelerate crc32_be. With them all done, we can get rid of about 32K of code and tables for the slice-by-8 software solution, which should more than pay for the size of enabling the arm-ce crypto.

This version is also an earlier, less-pipelined version of the upstream code. Commit logs suggested that it would be slightly faster on A53 than the latest version, but it seems that may not be the case. I might update it after more tests.

Speed-up is nearly 8x for the LE ops, and over 5x for the BE ops.

kjbracey2 avatar Jan 13 '22 07:01 kjbracey2

Updated to Linux 4.21 version - it is about 35% faster on my RT-AX88U, despite upstream changelog suggesting it was slightly slower on A53.

Original test time: 170 µs 4.20 version time: 29 µs 4.21 version time: 21 µs

kjbracey2 avatar Jan 13 '22 16:01 kjbracey2

I tested the changes, work faster on my RT-AX88U

jonathanmassehsj avatar Feb 23 '22 01:02 jonathanmassehsj

Is there a particular reason why the code is not making use of carry-less multiplication (using the pmull instruction)? On my RT-AC86U, /proc/cpuinfo does advertise that feature. Could some tricks from MariaDB/server#1652 be adopted? Obviously, we would want compile-time detection instead of runtime detection here.

Note: I am not too familiar with ARMv8 implementations or router SoCs. It might bee that pmull is not supported by some ARMv8 SoCs that this code base is targeting.

dr-m avatar Jul 15 '22 16:07 dr-m