Ying Zhang

Results 4 issues of Ying Zhang

This is the original gemm_universal_with_broadcast PR written at April. The added unittest test/unit/gemm/device/gemm_broadcast_test.cu passed at that time. But now it cannot pass any more.

Summary: This diff is reverting D42977698 (https://github.com/facebookincubator/AITemplate/commit/5173b284ebfef102ad1ab4a46ec2b9604f1f3275)

CLA Signed
fb-exported

ATT, this is to support variable sequence length in the destination tensor.

inactive-30d

This PR contains following changes: * Add var-seq-len support to fp16 / bf16 fwd kernel. * Add unittests for var-seq-len fwd. * Add a simple benchmark to compare fp16 /...