Ying Zhang
Ying Zhang
This is the original gemm_universal_with_broadcast PR written at April. The added unittest test/unit/gemm/device/gemm_broadcast_test.cu passed at that time. But now it cannot pass any more.
Summary: This diff is reverting D42977698 (https://github.com/facebookincubator/AITemplate/commit/5173b284ebfef102ad1ab4a46ec2b9604f1f3275)
ATT, this is to support variable sequence length in the destination tensor.
This PR contains following changes: * Add var-seq-len support to fp16 / bf16 fwd kernel. * Add unittests for var-seq-len fwd. * Add a simple benchmark to compare fp16 /...