FBGEMM
FBGEMM copied to clipboard
redesign parallelism for small B
Summary: As titled, redesign for small B (1< B < 64) and small T (T<=320) case, since current implementation only benefits large B * T.
Differential Revision: D54270887