blake2b-opt
blake2b-opt copied to clipboard
AVX2-64 code seems to be broken
I've tried to compile the code on OSX system. However, the compilation failed with the following error:
clang -cc1as: fatal error: error in backend: 32-bit absolute addressing is not supported in 64-bit mode
I've tried to fix it by switching to %rip
addressing applying patch like this one: https://gist.github.com/vstakhov/37442eaf04ebfdd315e0 but despite of compiling it caused core dump:
Process 90423 stopped
* thread #1: tid = 0x2147d4b, 0x0000000100007200 blake2b-util`.Lblake2b_blocks_avx2_11, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x18100800)
frame #0: 0x0000000100007200 blake2b-util`.Lblake2b_blocks_avx2_11
blake2b-util`.Lblake2b_blocks_avx2_11:
-> 0x100007200 <+0>: movzbl (%rax), %ebx
0x100007203 <+3>: addq $0x10, %rax
0x100007207 <+7>: movzbl -0xc(%rax), %r13d
0x10000720c <+12>: movzbl -0xe(%rax), %r11d
Registers content:
General Purpose Registers:
rax = 0x0000000018100800
rbx = 0x0000000000000000
rcx = 0x0000000000000000
rdx = 0x0000000000000000
rdi = 0x00007fff5fbff8e0
rsi = 0x00007fff5fbff4c0
rbp = 0x00007fff5fbff590
rsp = 0x00007fff5fbff4f8
r8 = 0xffffffffffffffff
r9 = 0xffffffffffffffff
r10 = 0x0000000000000000
r11 = 0x0000000000000000
r12 = 0x0000000100006900 blake2b-util`blake2b_constants
r13 = 0x0000000100006a00 blake2b-util`blake2b_constants_ssse3
r14 = 0x00007fff5fbff930
r15 = 0x0000000000000000
rip = 0x0000000100007200 blake2b-util`.Lblake2b_blocks_avx2_11
rflags = 0x0000000000010286
cs = 0x000000000000002b
fs = 0x0000000000000000
gs = 0x0000000000000000
Other extensions work fine after fuzz testing.
That helped, thank you.
bin/blake2b-util bench
time granularity: 24 cycles, 2195297384 cycles/second
1 byte(s):
avx2, 396.00 cycles per call, 396.0000 cycles/byte
avx, 333.00 cycles per call, 333.0000 cycles/byte
x86, 356.00 cycles per call, 356.0000 cycles/byte
generic/64, 586.00 cycles per call, 586.0000 cycles/byte
128 byte(s):
avx2, 389.00 cycles per call, 3.0391 cycles/byte
avx, 316.00 cycles per call, 2.4688 cycles/byte
x86, 353.00 cycles per call, 2.7578 cycles/byte
generic/64, 581.00 cycles per call, 4.5391 cycles/byte
576 byte(s):
avx2, 1416.00 cycles per call, 2.4583 cycles/byte
avx, 1474.00 cycles per call, 2.5590 cycles/byte
x86, 1648.00 cycles per call, 2.8611 cycles/byte
generic/64, 2450.00 cycles per call, 4.2535 cycles/byte
8192 byte(s):
avx2, 16426.00 cycles per call, 2.0051 cycles/byte
avx, 18352.00 cycles per call, 2.2402 cycles/byte
x86, 20888.00 cycles per call, 2.5498 cycles/byte
generic/64, 28548.00 cycles per call, 3.4849 cycles/byte
Looks like #3 corrects this issue.