go icon indicating copy to clipboard operation
go copied to clipboard

crypto/cipher: assembly for arm

Open kkoogqw opened this issue 5 years ago • 2 comments

What version of Go are you using (go version)?

$ go version 1.14.4

When I run AES-CBC performance analysis on amd64 and arm64 platforms, I found that function:func xorBytes(dst, a, b []byte) int and func safeXORBytes(dst, a, b []byte, n int) (in crypto/cipher/xor_generic.go) on arm64-arch always appears top15 in pprof list. Compared with amd64-arch, this function uses SSE2 SIMD instruction in func xorBytesSSE2(dst, a, b *byte, n int).

```bash
(pprof) top10
Showing nodes accounting for 700ms, 55.12% of 1270ms total
Showing top 10 nodes out of 113
      flat  flat%   sum%        cum   cum%
     170ms 13.39% 13.39%      530ms 41.73%  runtime.mallocgc
      90ms  7.09% 20.47%       90ms  7.09%  crypto/cipher.safeXORBytes
      90ms  7.09% 27.56%      130ms 10.24%  syscall.Syscall
      80ms  6.30% 33.86%       80ms  6.30%  runtime.nextFreeFast (inline)
      60ms  4.72% 38.58%       60ms  4.72%  runtime.publicationBarrier
      50ms  3.94% 42.52%       50ms  3.94%  crypto/aes.expandKeyAsm
      50ms  3.94% 46.46%      140ms 11.02%  crypto/cipher.xorBytes
      40ms  3.15% 49.61%       40ms  3.15%  runtime.acquirem (inline)
      40ms  3.15% 52.76%       40ms  3.15%  runtime.memclrNoHeapPointers
      30ms  2.36% 55.12%       30ms  2.36%  crypto/internal/subtle.InexactOverlap

I consider whether we can use the arm64 SIMD instruction to optimize the performance of this function?

kkoogqw avatar Oct 16 '20 08:10 kkoogqw

Change https://golang.org/cl/142537 mentions this issue: crypto/cipher: use Neon for xor on arm64

gopherbot avatar Oct 19 '20 01:10 gopherbot

See my PR #53154 which adds non-NEON and NEON implementations of xorBytes for ARM. This bridges the gap with ARM64.

adriancable avatar May 31 '22 16:05 adriancable