composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

List of GEMM+epilogues fusion needed for Transformer Engine

Open bghimireamd opened this issue 1 year ago • 1 comments

The ones we need for Transformer Engine are the following:

  1. CUBLASLT_EPILOGUE_GELU_AUX step 1 : matrix multiplication step 2 : apply gelu step 3 : store the result to seperate matrix (basically do matrix copy)

    • I think we can use CK's gemm_fastgelu for this.
  2. CUBLASLT_EPILOGUE_DGELU :      step 1 : matrix multiplication     step 2 : apply derivative of gelu

  3. CUBLASLT_EPILOGUE_BIAS     step 1 : matrix multiplication A( M, K ) X B (K, N) = C (M, N) :      step 2 : Obtain Bias Vector (1, M) eg: [ 0.1, 0.2 0.3]     step 3 : Broadcast Bias Vector :                  we can simply replicate it N times along the columns,                  resulting in a new bias vector with dimensions (M x N)             eg:              A(4, 2), B(2, 3), C(4, 3)             Broadcast Bias:  

                |0.1 0.2 0.3|                 |0.1 0.2 0.3|                 |0.1 0.2 0.3|                 |0.1 0.2 0.3|             C = C + Broadcast Bias

  1. CUBLASLT_EPILOGUE_BGRADB     Apply Bias gradient to the input matrix B. The bias size corresponds to the number of columns of the matrix D.      The reduction happens over the GEMM’s “k” dimension. Store Bias gradient in the bias buffer

  2. CUBLASLT_EPILOGUE_GELU_AUX_BIAS     * fusion of gelu and bias

  3. CUBLASLT_EPILOGUE_DGELU_BGRAD     * fusion of gelu and bgrad

bghimireamd avatar Apr 11 '23 19:04 bghimireamd

Synced with @bghimireamd, CK already has CUBLASLT_EPILOGUE_GELU_AUX, CUBLASLT_EPILOGUE_BIAS, CUBLASLT_EPILOGUE_GELU_AUX_BIAS. We can quickly add CUBLASLT_EPILOGUE_DGELU. For CUBLASLT_EPILOGUE_BGRADB, we need double-check.

zjing14 avatar Apr 25 '23 16:04 zjing14