TeslaCoder

Results 3 comments of TeslaCoder

1. I could send you a picture of visual profiler so you can see the problem, for example, DGEMM kernel in one stream must overlap small copy A + tranpose...

Thanks David! The patch in test branch fixes the seg fault problem. As for the tranpose kernel blocking copy C, I commited the changes to expose parallelism between copy and...

One last note, although the fix in hpl-gpu test branch resolved the segfault at the end of HPL, the other recent changes in hpl-gpu caused seg fault when starting HPL,...