CudaMiner icon indicating copy to clipboard operation
CudaMiner copied to clipboard

Floating point exception

Open AdrienLemaire opened this issue 11 years ago • 9 comments

Updated cudaminer with latest commit this morning, and it broke:

[2014-01-12 10:32:07] 1 miner threads started, using 'scrypt' algorithm.
[2014-01-12 10:32:07] Starting Stratum on stratum+tcp://eac.us1.hackshard.com:3333
[2014-01-12 10:32:08] Stratum detected new block
[2014-01-12 10:32:08] GPU #0: GeForce GTX 675M with compute capability 2.1
[2014-01-12 10:32:08] GPU #0: interactive: 1, tex-cache: 1D, single-alloc: 1
[2014-01-12 10:32:08] GPU #0: Performing auto-tuning (Patience...)
[2014-01-12 10:32:08] GPU #0: maximum warps: 502
[2014-01-12 10:32:08] GPU #0:    0.00 khash/s with configuration F0x0
[2014-01-12 10:32:08] GPU #0: using launch configuration F0x0
[1]    29067 floating point exception (core dumped)  cudaminer -H 2 -d 0 -i 1,0,0 -l auto,K4x16 -C 1 -o  -O

AdrienLemaire avatar Jan 11 '14 23:01 AdrienLemaire

I too have been getting this error since I've updated. I've had to manually play around with configurations to get it working.

Jamonek avatar Jan 12 '14 01:01 Jamonek

Got a backtrace. WARPS_PER_BLOCK is 0 for some reason. I've been having this problem sporadically, too. Git commit e0c7371a1efeb1c2164eddef2430e07cbd3eeae8

$gdb cudaminer GNU gdb (GDB) 7.6.1 (Debian 7.6.1-1) Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /home/dan/projects/CudaMiner/cudaminer...done. (gdb) r -c ~/.config/cudaminer Starting program: /home/dan/projects/CudaMiner/cudaminer -c ~/.config/cudaminer warning: Could not load shared library symbols for linux-vdso.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". *** CudaMiner for nVidia GPUs by Christian Buchner *** This is version 2013-12-18 (beta) based on pooler-cpuminer 2.3.2 (c) 2010 Jeff Garzik, 2012 pooler Cuda additions Copyright 2013 Christian Buchner My donation address: LKS1WDKGED647msBQfLBHV3Ls8sveGncnm

[New Thread 0x7ffff2651700 (LWP 4204)] [New Thread 0x7ffff1e50700 (LWP 4205)] [2014-01-12 00:53:11] Starting Stratum on stratum+tcp://doge.netcodepool.org:4095 [New Thread 0x7ffff164f700 (LWP 4206)] [2014-01-12 00:53:11] 1 miner threads started, using 'scrypt' algorithm. [New Thread 0x7ffff0e4e700 (LWP 4207)] [Thread 0x7ffff0e4e700 (LWP 4207) exited] [2014-01-12 00:53:11] Stratum detected new block [New Thread 0x7ffff0e4e700 (LWP 4208)] [2014-01-12 00:53:12] GPU #0: GeForce GTX 570 with compute capability 2.0 [2014-01-12 00:53:12] GPU #0: interactive: 1, tex-cache: 0 , single-alloc: 0 [2014-01-12 00:53:12] GPU #0: Performing auto-tuning (Patience...) [2014-01-12 00:53:12] GPU #0: maximum warps: 267 [2014-01-12 00:53:12] GPU #0: 0.00 khash/s with configuration F0x0 [2014-01-12 00:53:12] GPU #0: using launch configuration F0x0

Program received signal SIGFPE, Arithmetic exception. [Switching to Thread 0x7ffff164f700 (LWP 4206)] 0x000000000041fa68 in cuda_scrypt_core (thr_id=0, stream=0, N=1024) at salsa_kernel.cu:790 790 dim3 grid(WU_PER_LAUNCH/WU_PER_BLOCK, 1, 1); (gdb) p WARPS_PER_BLOCK $1 = 0 (gdb) p context_wpb[thr_id] Could not find operator[]. (gdb) thread apply all bt

Thread 6 (Thread 0x7ffff0e4e700 (LWP 4208)): #0 0x00007ffff6a1095d in poll () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff2dc7e93 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so #2 0x00007ffff283ea95 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so #3 0x00007ffff2dc9d29 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so #4 0x00007ffff7958e0e in start_thread (arg=0x7ffff0e4e700) at pthread_create.c:311 #5 0x00007ffff6a1c0fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 4 (Thread 0x7ffff164f700 (LWP 4206)): #0 0x000000000041fa68 in cuda_scrypt_core (thr_id=0, stream=0, N=1024) at salsa_kernel.cu:790 #1 0x00000000004163e2 in scanhash_scrypt (thr_id=thr_id@entry=0, pdata=pdata@entry=0x7ffff164eca0, ptarget=ptarget@entry=0x7ffff164ed20, max_nonce=max_nonce@entry=4095, hashes_done=hashes_done@entry=0x7ffff164ebe8) at scrypt.cpp:759 #2 0x0000000000407439 in miner_thread (userdata=) at cpu-miner.c:820 #3 0x00007ffff7958e0e in start_thread (arg=0x7ffff164f700) at pthread_create.c:311 #4 0x00007ffff6a1c0fd in clone () ---Type to continue, or q to quit--- at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 3 (Thread 0x7ffff1e50700 (LWP 4205)): #0 0x00007ffff6a14f53 in select () at ../sysdeps/unix/syscall-template.S:81 #1 0x0000000000408818 in socket_full (sock=, timeout=) at util.c:631 #2 0x00000000004090b3 in stratum_socket_full ( sctx=sctx@entry=0x798460 , timeout=timeout@entry=120) at util.c:638 #3 0x0000000000407793 in stratum_thread (userdata=) at cpu-miner.c:1051 #4 0x00007ffff7958e0e in start_thread (arg=0x7ffff1e50700) at pthread_create.c:311 #5 0x00007ffff6a1c0fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 2 (Thread 0x7ffff2651700 (LWP 4204)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x000000000040b130 in tq_pop (tq=0x7d2c90, abstime=abstime@entry=0x0) at util.c:1295 #2 0x0000000000407d4b in workio_thread (userdata=0x7d2c38) at cpu-miner.c:570 #3 0x00007ffff7958e0e in start_thread (arg=0x7ffff2651700) ---Type to continue, or q to quit--- at pthread_create.c:311 #4 0x00007ffff6a1c0fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 1 (Thread 0x7ffff7fba7c0 (LWP 4200)): #0 0x00007ffff7959ff8 in pthread_join (threadid=140737251706624, thread_return=thread_return@entry=0x0) at pthread_join.c:92 #1 0x0000000000403bd1 in main (argc=, argv=) at cpu-miner.c:1638 (gdb) c Continuing. [Thread 0x7ffff0e4e700 (LWP 4208) exited] [Thread 0x7ffff164f700 (LWP 4206) exited] [Thread 0x7ffff1e50700 (LWP 4205) exited] [Thread 0x7ffff2651700 (LWP 4204) exited]

Program terminated with signal SIGFPE, Arithmetic exception. The program no longer exists. (gdb) q

dchokola avatar Jan 12 '14 06:01 dchokola

I get that too occasionally. Probably the cudaGetLastError() is not cleared before entering the autotune algorithm, making it terminate early.

2014/1/12 dchokola [email protected]

Got a backtrace. WARPS_PER_BLOCK is 0 for some reason. I've been having this problem sporadically, too.

$gdb cudaminer GNU gdb (GDB) 7.6.1 (Debian 7.6.1-1) Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /home/dan/projects/CudaMiner/cudaminer...done. (gdb) r -c ~/.config/cudaminer Starting program: /home/dan/projects/CudaMiner/cudaminer -c ~/.config/cudaminer warning: Could not load shared library symbols for linux-vdso.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". *** CudaMiner for nVidia GPUs by Christian Buchner *** This is version 2013-12-18 (beta) based on pooler-cpuminer 2.3.2 (c) 2010 Jeff Garzik, 2012 pooler Cuda additions Copyright 2013 Christian Buchner My donation address: LKS1WDKGED647msBQfLBHV3Ls8sveGncnm

[New Thread 0x7ffff2651700 (LWP 4204)] [New Thread 0x7ffff1e50700 (LWP 4205)] [2014-01-12 00:53:11] Starting Stratum on stratum+tcp:// doge.netcodepool.org:4095 [New Thread 0x7ffff164f700 (LWP 4206)] [2014-01-12 00:53:11] 1 miner threads started, using 'scrypt' algorithm. [New Thread 0x7ffff0e4e700 (LWP 4207)] [Thread 0x7ffff0e4e700 (LWP 4207) exited] [2014-01-12 00:53:11] Stratum detected new block [New Thread 0x7ffff0e4e700 (LWP 4208)] [2014-01-12 00:53:12] GPU #0: GeForce GTX 570 with compute capability 2.0 [2014-01-12 00:53:12] GPU #0: interactive: 1, tex-cache: 0 , single-alloc: 0 [2014-01-12 00:53:12] GPU #0: Performing auto-tuning (Patience...) [2014-01-12 00:53:12] GPU #0: maximum warps: 267 [2014-01-12 00:53:12] GPU #0: 0.00 khash/s with configuration F0x0 [2014-01-12 00:53:12] GPU #0: using launch configuration F0x0

Program received signal SIGFPE, Arithmetic exception. [Switching to Thread 0x7ffff164f700 (LWP 4206)] 0x000000000041fa68 in cuda_scrypt_core (thr_id=0, stream=0, N=1024) at salsa_kernel.cu:790 790 dim3 grid(WU_PER_LAUNCH/WU_PER_BLOCK, 1, 1); (gdb) p WARPS_PER_BLOCK $1 = 0 (gdb) p context_wpb[thr_id] Could not find operator[]. (gdb) thread apply all bt

Thread 6 (Thread 0x7ffff0e4e700 (LWP 4208)): #0 0x00007ffff6a1095d in poll () at ../sysdeps/unix/syscall-template.S:81 #1 https://github.com/cbuchner1/CudaMiner/issues/1 0x00007ffff2dc7e93 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so #2 https://github.com/cbuchner1/CudaMiner/issues/2 0x00007ffff283ea95 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so #3 https://github.com/cbuchner1/CudaMiner/issues/3 0x00007ffff2dc9d29 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so #4 https://github.com/cbuchner1/CudaMiner/pull/4 0x00007ffff7958e0e in start_thread (arg=0x7ffff0e4e700) at pthread_create.c:311 #5 https://github.com/cbuchner1/CudaMiner/issues/5 0x00007ffff6a1c0fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 4 (Thread 0x7ffff164f700 (LWP 4206)): #0 0x000000000041fa68 in cuda_scrypt_core (thr_id=0, stream=0, N=1024) at salsa_kernel.cu:790 #1 https://github.com/cbuchner1/CudaMiner/issues/1 0x00000000004163e2 in scanhash_scrypt (thr_id=thr_id@entry=0, pdata=pdata@entry=0x7ffff164eca0, ptarget=ptarget@entry=0x7ffff164ed20, max_nonce=max_nonce@entry=4095, hashes_done=hashes_done@entry=0x7ffff164ebe8) at scrypt.cpp:759 #2 https://github.com/cbuchner1/CudaMiner/issues/2 0x0000000000407439 in miner_thread (userdata=) at cpu-miner.c:820 #3 https://github.com/cbuchner1/CudaMiner/issues/3 0x00007ffff7958e0e in start_thread (arg=0x7ffff164f700) at pthread_create.c:311 #4 https://github.com/cbuchner1/CudaMiner/pull/4 0x00007ffff6a1c0fd in clone () ---Type to continue, or q to quit--- at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 3 (Thread 0x7ffff1e50700 (LWP 4205)): #0 0x00007ffff6a14f53 in select () at ../sysdeps/unix/syscall-template.S:81 #1 https://github.com/cbuchner1/CudaMiner/issues/1 0x0000000000408818 in socket_full (sock=, timeout=) at util.c:631 #2 https://github.com/cbuchner1/CudaMiner/issues/2 0x00000000004090b3 in stratum_socket_full ( sctx=sctx@entry=0x798460 , timeout=timeout@entry=120) at util.c:638 #3 https://github.com/cbuchner1/CudaMiner/issues/3 0x0000000000407793 in stratum_thread (userdata=) at cpu-miner.c:1051 #4 https://github.com/cbuchner1/CudaMiner/pull/4 0x00007ffff7958e0e in start_thread (arg=0x7ffff1e50700) at pthread_create.c:311 #5 https://github.com/cbuchner1/CudaMiner/issues/5 0x00007ffff6a1c0fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 2 (Thread 0x7ffff2651700 (LWP 4204)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 https://github.com/cbuchner1/CudaMiner/issues/1 0x000000000040b130 in tq_pop (tq=0x7d2c90, abstime=abstime@entry=0x0) at util.c:1295 #2 https://github.com/cbuchner1/CudaMiner/issues/2 0x0000000000407d4b in workio_thread (userdata=0x7d2c38) at cpu-miner.c:570 #3 https://github.com/cbuchner1/CudaMiner/issues/3 0x00007ffff7958e0e in start_thread (arg=0x7ffff2651700) ---Type to continue, or q to quit--- at pthread_create.c:311 #4 https://github.com/cbuchner1/CudaMiner/pull/4 0x00007ffff6a1c0fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 1 (Thread 0x7ffff7fba7c0 (LWP 4200)): #0 0x00007ffff7959ff8 in pthread_join (threadid=140737251706624, thread_return=thread_return@entry=0x0) at pthread_join.c:92 #1 https://github.com/cbuchner1/CudaMiner/issues/1 0x0000000000403bd1 in main (argc=, argv=) at cpu-miner.c:1638 (gdb) c Continuing. [Thread 0x7ffff0e4e700 (LWP 4208) exited] [Thread 0x7ffff164f700 (LWP 4206) exited] [Thread 0x7ffff1e50700 (LWP 4205) exited] [Thread 0x7ffff2651700 (LWP 4204) exited]

Program terminated with signal SIGFPE, Arithmetic exception. The program no longer exists. (gdb) q

— Reply to this email directly or view it on GitHubhttps://github.com/cbuchner1/CudaMiner/issues/64#issuecomment-32116124 .

cbuchner1 avatar Jan 12 '14 20:01 cbuchner1

Any idea how to fix this yet?

Jamonek avatar Jan 15 '14 23:01 Jamonek

Is the problem gone now? I have made some code changes after running into this problem myself...

2014/1/16 Jamone Kelly [email protected]

Any idea how to fix this yet?

— Reply to this email directly or view it on GitHubhttps://github.com/cbuchner1/CudaMiner/issues/64#issuecomment-32428110 .

cbuchner1 avatar Jan 16 '14 11:01 cbuchner1

I pulled this morning and the problem is gone for me now. Thanks!

alanmcintyre avatar Jan 16 '14 13:01 alanmcintyre

Still here for me. Although only on the kepler gpu.

edit: no somehow its working now.

ccomly avatar Jan 16 '14 18:01 ccomly

Just cloned the latest commit and is not working.

Jamonek avatar Jan 16 '14 19:01 Jamonek

Still getting the floating point exceptions on 142261bb89144364873ab4c70772a96d16647966

Update: fixed for me with a more recent checkin.

joeyo avatar Jan 21 '14 18:01 joeyo