Yuanqiang Liu
Yuanqiang Liu
@hwu36 Could you please look at this issue? Thank you!
By the way, on the same GPU, conv2d kernel doesn't has the difference.
I tried to compiling my kernel with NVCC 11.4. The profiling time improved from **5.4512ms** to **5.1043ms**. But the cutlass profiler's profiling time is also about **4.28195ms**
I have locked the frequency to **1005 MHz** on **T4**. And I have **5 warmup** runs, **100 profiling iters**. The cutlass profiler's result is **3.72416ms**, my kernel's result is **5.1481ms**....
> @qingyunqu Can you please resolve conflicts? Thank you! Done.
Hi, is there any guys could review this PR?
Very thanks for review, I will fix these later.
> Thanks! Sorry for not addressing this sooner. I'm not an expert on this part of the code, but I will try to help. Could you first rebase your PR...
> Also, could you add a test case with which we could have found this bug? This PR fix mem-leak of `npy_file`'s move assignment method. It will leak when: ```...
> @qingyunqu Can you please resolve conflicts? Thank you! Ok, I failed to access github via ssh. I will rebase and resolve as soon as when I could push to...