Mengchi Zhang

Results 4 issues of Mengchi Zhang

Summary: Revert D34812414 (https://github.com/pytorch/FBGEMM/commit/9416c3729acf2e2baa611ad87b67dce1e3b596fc) because we don't support BWD and FP32 in permute-baddbmm-permute ops. Will need to reimplement it later with BWD and FP32. See this post for details: https://fb.workplace.com/groups/328887315320818/permalink/563064075236473/...

fb-exported
cla signed

Hi all, I tried to write a __global__ kernel and a __device__ function and compile __device__ function in a relocatable way(-fgpu-rdc). Then I tried to use extractkernel script to dump...

Summary: As titled, redesign for small B (1< B < 64) and small T (T

fb-exported
cla signed

Summary: As titled. Differential Revision: D55098702

fb-exported
cla signed