caldgemm
caldgemm copied to clipboard
Portable and Flexible DGEMM Library for GPUs (OpenCL, CUDA, CAL) with special support for HPL
Results
2
caldgemm issues
Sort by
recently updated
recently updated
newest added
I compile it successfully and I use cuda as the backend. When I execute this command `'./dgemm_bench -O 2 -p -A -B -m 40960 -n 40960'`, it always poses such...
call fast kernel cublas dgemmNT Copy C before Copy+transpose A+B (fix for pipeline blocked by transpose kernel) call transpose only when needed, otherwise copy data directly to dest_image destroy cublas...