TeslaCoder

Results 1 issues of TeslaCoder

call fast kernel cublas dgemmNT Copy C before Copy+transpose A+B (fix for pipeline blocked by transpose kernel) call transpose only when needed, otherwise copy data directly to dest_image destroy cublas...