rsync icon indicating copy to clipboard operation
rsync copied to clipboard

New x86-64 AVX2 assemby implemenation of get_checksum1() crashed with SIGILL

Open LynxChaus opened this issue 3 years ago • 2 comments

After merging #174 rsync fails tests - it's simple crash:

Thread 2.1 "rsync" received signal SIGILL, Illegal instruction.
get_checksum1_avx2 () at ./simd-checksum-avx2.S:18
18		vmovntdqa ymm7, [rax]   # load T2 multiplication constants
(gdb) bt
#0  get_checksum1_avx2 () at ./simd-checksum-avx2.S:18
#1  0x00005555555a32b6 in get_checksum1 ()
#2  0x000055555556f2af in generate_and_send_sums (f_copy=<optimized out>, f_out=1, len=967816, fd=0) at generator.c:798
#3  recv_generator (fname=fname@entry=0x7fffffff9f20 "dir/text", file=file@entry=0x7ffff77d7ee8, ndx=<optimized out>, itemizing=itemizing@entry=1, code=code@entry=FLOG, f_out=f_out@entry=1) at generator.c:1950
#4  0x0000555555570ccd in generate_files (f_out=f_out@entry=1, local_name=local_name@entry=0x0) at generator.c:2318
#5  0x000055555557e23c in do_recv (f_in=<optimized out>, f_in@entry=0, f_out=f_out@entry=1, local_name=local_name@entry=0x0) at main.c:1102
#6  0x000055555557e994 in do_server_recv (argv=<optimized out>, argc=<optimized out>, f_out=1, f_in=0) at main.c:1215
#7  start_server (f_in=f_in@entry=0, f_out=f_out@entry=1, argc=<optimized out>, argv=<optimized out>) at main.c:1249
#8  0x000055555557eaf5 in child_main (argc=<optimized out>, argv=<optimized out>) at main.c:1222
#9  0x00005555555a2ec5 in local_child (argc=2, argv=argv@entry=0x7fffffffb1f0, f_in=f_in@entry=0x7fffffffb150, f_out=f_out@entry=0x7fffffffb154, child_main=child_main@entry=0x55555557eae0 <child_main>) at pipe.c:166
#10 0x000055555555e742 in do_cmd (f_out_p=0x7fffffffb154, f_in_p=0x7fffffffb150, remote_argc=<optimized out>, remote_argv=<optimized out>, user=0x0, machine=<optimized out>, cmd=<optimized out>) at main.c:644
#11 start_client (argv=<optimized out>, argc=1) at main.c:1571
#12 main (argc=<optimized out>, argv=<optimized out>) at main.c:1831

(gdb) l
13		vmovd	xmm6,[rcx] # load *ps1
14		lea	eax, [rsi-128] # at least 128 bytes to process?
15		cmp	edx, eax
16		jg	.exit
17		lea	rax, .mul_T2[rip]
18		vmovntdqa ymm7, [rax]   # load T2 multiplication constants
19		vmovntdqa ymm12,[rax+32]# from memory.
20		vpcmpeqd  ymm15, ymm15, ymm15 # set all elements to -1.
21	
22	#if CHAR_OFFSET != 0
(gdb) info registers 
rax            0x5555555baa00      93824992651776
rbx            0x3d8               984
rcx            0x7fffffff7bc0      140737488321472
rdx            0x0                 0
rsi            0x3d8               984
rdi            0x7ffff7716010      140737344790544
rbp            0x7ffff7716010      0x7ffff7716010
rsp            0x7fffffff7ba8      0x7fffffff7ba8
r8             0x7fffffff7bc4      140737488321476
r9             0x0                 0
r10            0x22                34
r11            0x246               582
r12            0x7fffffff7bc0      140737488321472
r13            0x7fffffff7bc4      140737488321476
r14            0x7ffff7716010      140737344790544
r15            0xffffffff          4294967295
rip            0x5555555a35b6      0x5555555a35b6 <get_checksum1_avx2+22>
eflags         0x10293             [ CF AF SF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0

mapbuf in generate_and_send_sums() seems properly aligned to 128 bit:

(gdb) print *mapbuf 
$28 = {file_size = 968800, p_offset = 0, p_fd_offset = 263168, 
  p = 0x7ffff7716010 "/*\n * Routines to authenticate access to a daemon (hosts allow/deny).\n *\n * Copyright (C) 1998 Andrew Tridgell\n * Copyright (C) 2004-2022 Wayne Davison\n *\n * This program is free software; you can red"..., p_size = 263168, p_len = 263168, def_window_size = 263168, fd = 0, status = 0}

LynxChaus avatar Feb 01 '22 23:02 LynxChaus

Please try out the latest code (with a fresh configure run). It will now default to not using the asm version of the rolling-checksum code. If you want to try the asm code in the future, a configure run with --enable-roll-asm will enable it in the build.

WayneD avatar Mar 04 '22 01:03 WayneD

same here when we tried --enable-roll-asm as we try to achieve maximum performance for some x-TB big file syncs ... 😅

futureweb avatar Sep 13 '23 15:09 futureweb