Mengchi Zhang
Mengchi Zhang
Summary: Revert D34812414 (https://github.com/pytorch/FBGEMM/commit/9416c3729acf2e2baa611ad87b67dce1e3b596fc) because we don't support BWD and FP32 in permute-baddbmm-permute ops. Will need to reimplement it later with BWD and FP32. See this post for details: https://fb.workplace.com/groups/328887315320818/permalink/563064075236473/...
Hi all, I tried to write a __global__ kernel and a __device__ function and compile __device__ function in a relocatable way(-fgpu-rdc). Then I tried to use extractkernel script to dump...
Summary: As titled, redesign for small B (1< B < 64) and small T (T
Summary: As titled. Differential Revision: D55098702