caldgemm icon indicating copy to clipboard operation
caldgemm copied to clipboard

Portable and Flexible DGEMM Library for GPUs (OpenCL, CUDA, CAL) with special support for HPL

Results 2 caldgemm issues
Sort by recently updated
recently updated
newest added

I compile it successfully and I use cuda as the backend. When I execute this command `'./dgemm_bench -O 2 -p -A -B -m 40960 -n 40960'`, it always poses such...

call fast kernel cublas dgemmNT Copy C before Copy+transpose A+B (fix for pipeline blocked by transpose kernel) call transpose only when needed, otherwise copy data directly to dest_image destroy cublas...