TeslaCoder
Results
1
issues of
TeslaCoder
call fast kernel cublas dgemmNT Copy C before Copy+transpose A+B (fix for pipeline blocked by transpose kernel) call transpose only when needed, otherwise copy data directly to dest_image destroy cublas...