Yuanqiang Liu

Results 20 comments of Yuanqiang Liu

@hwu36 Could you please look at this issue? Thank you!

By the way, on the same GPU, conv2d kernel doesn't has the difference.

I tried to compiling my kernel with NVCC 11.4. The profiling time improved from **5.4512ms** to **5.1043ms**. But the cutlass profiler's profiling time is also about **4.28195ms**

I have locked the frequency to **1005 MHz** on **T4**. And I have **5 warmup** runs, **100 profiling iters**. The cutlass profiler's result is **3.72416ms**, my kernel's result is **5.1481ms**....

> @qingyunqu Can you please resolve conflicts? Thank you! Done.

Hi, is there any guys could review this PR?

Very thanks for review, I will fix these later.

> Thanks! Sorry for not addressing this sooner. I'm not an expert on this part of the code, but I will try to help. Could you first rebase your PR...

> Also, could you add a test case with which we could have found this bug? This PR fix mem-leak of `npy_file`'s move assignment method. It will leak when: ```...

> @qingyunqu Can you please resolve conflicts? Thank you! Ok, I failed to access github via ssh. I will rebase and resolve as soon as when I could push to...