CCMpred
CCMpred copied to clipboard
Bug in CCMpred (CUDA)
Running CCMpred with a sequence alignment in a CUDA compiled version of CCMpred gives crashes sometimes. Error give:
adenine: felix > ccmpred alignments/1bdo.jones 1bdo.mat
Found 1 CUDA devices, using device #0: Quadro K4000
Total GPU RAM: 3,217,752,064
Free GPU RAM: 2,617,708,544
Needed GPU RAM: 792,606,940 ✓
CUDA error No. 0 in /opt/CCMpred/src/evaluate_cuda_kernels.cu at line 819
Running the same command with flag -t 2
runs fine.
Hi Felix, I don't have access to a suitable GPU/computer combination to debug this at the moment so I'm afraid that I will not be able to help 😞
No worries, the CPU version works fine so there's no rush. Just thought I'd report it ...
I encountered a similar error. The reason seems to be that I fed CCMpred with too much sequences (~70k). (The error code I got was 6.) Besides, the macro CHECK_ERR(err) defined in include/evaluate_cuda_kernels.h and lib/libconjugrad/include/conjugrad_kernels.h (and maybe other files) may call cudaGetLastError() multiple times, like those in src/evaluate_cuda_kernels.cu, after expansion. The problem is, referring to http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__ERROR.html#group__CUDART__ERROR_1g3529f94cb530a83a76613616782bd233, the error code will have been reset to cudaSuccess when output. So we always get "CUDA error No. 0". Something like https://codeyarns.com/2011/03/02/how-to-do-error-checking-in-cuda/ may be a solution.
This issue is still present, hiding error codes and always showing No. 0.
The reason being the error checking via
CHECK_ERR(cudaGetLastError());
which is not a function but a preprocessor macro defined as
#define CHECK_ERR(err) {if (cudaSuccess != (err)) { printf("CUDA error No. %d in %s at line %d\n", (err), __FILE__, __LINE__); exit(EXIT_FAILURE); } }
in evaluate_cuda_kernels.h, line 9. It therefore expands to call cudaGetLastError() two times, consuming the actual error code before displaying it.
I suggest to change the macro to
#define CHECK_ERR(err) { int e = (err); if (cudaSuccess != e) { printf("CUDA error No. %d in %s at line %d\n", e, __FILE__, __LINE__); exit(EXIT_FAILURE); } }