hashkill
hashkill copied to clipboard
Strange compile error
After
./configure
make
I got the following error.
Compiling nvidia_bfunix without flags... clBuildProgram(): CL_INVALID_BINARY Log: ============== ptxas error : Entry function 'bfunix' uses too much shared data (0x8070 bytes + 0x10 bytes system, 0x4000 max)
I am not sure If I am doing something wrong here(should I call make without some of the features) or this is an issue. I am currently using Ubuntu 12.10.
Strangely enough I am using Archlinux at home and I was able to compile it there. Please let me know what you think.
Hello,
Could you please provide the GPUs installed at both places and the driver versions if possible?
Ok sorry for the huge delay I completely forgot about this issue :(
Here is a gist from
lspci -vv
on the working and not working machine. Tell me if you need something else.
I now see the problem. The nvidia bfunix kernel indeed uses 32KB of local memory while the oldest supported Nvidia devices have just 16KB per compute unit. Interestingly though, the cross-compiler happily compiles the sm10 kernel binaries on my system.
Solution will be to either have a separate sm10 kernel or disable bfunix for old nvidia gpus. Disabling is probably the better solution as it would be very slow anyway, quite likely much slower than the speed on a modern CPU.
Well as far as I can understand the GPU on this desktop machine is pretty bad and there are assumptions for bigger memory in the GPU? If this is the case can you provide a compile switch to disable the GPU support altogether or give me hints where I can disable it.
I will be glad to provide a patch if I can ;)
Best, Nikola
Simplest workaround would be to open src/kernels/nvidia_bfunix.cl and put at the top:
#ifndef SM10
and at the bottom:
#endif
Not the best solution (no warning about bcrypt not supported on that gpu) but at least it would compile.
Fixed in git now.
I pretty much had the same error compiling version 0.3.1.:
Compiling nvidia_bfunix without flags...
Log:
==============
ptxas error : Entry function 'bfunix' uses too much shared data (0x8070 bytes + 0x10 bytes system, 0x4000 max)
Here is my VGA's "lspci -vv": https://gist.github.com/gabrielmagno/7257498
That's not good. Looks like other non-sm_10 devices lack enough shared memory. Please use the #ifdef solution until I implement a proper fix for that (which as minor as it sounds won't happen the next few days, I am preoccupied with the a51 stuff right now :( )
I created this patch, that fixed the issue (at least for my device): https://gist.github.com/gabrielmagno/7265841
I've used clGetDeviceInfo with CL_DEVICE_LOCAL_MEM_SIZE to get the amount of local memory. Then, I conceived the flag LOCMEM16K, that is inserted into the compiler flag if the device has exatcly 16K of local memory. Finally, I inserted an #ifndef LOCMEM16K in nvidia_bfunix.cl.
Great job!
I will integrate it as soon as I have some time (perhaps tonight or tomorrow).
Thanks!