Paddle icon indicating copy to clipboard operation
Paddle copied to clipboard

GemmEpilogueOp with series of CUTLASS kernel

Open YKTian-x2b opened this issue 1 year ago • 7 comments

PR Category

Others

PR Types

Others

Description

P-card-71501

目标是要融合形如 matmul + add + act 的模式。用Cutlass编写GemmEpilogueOp,生成多种内核配置,寻求更优的融合实现。

matmul_add_act_fuse_pass支持 cublasLt(FcOp) 和 cutlass(GemmEpilogueOp) 两种路径,用户通过Exp_EnableUseCutlass() API修改analysis_config,来选择是否启用cutlass实现的Op(GemmEpilogueOp):在create_predictor的时候会读取analysis_config,给matmul_add_act_fuse_pass设置use_cutlass属性,并将该pass加入passManager。在Run该passManager的时候,matmul_add_act_fuse_pass对象的InitializePatterns方法被调用,pass对象根据get到的use_cutlass属性值,选择生成GemmEpilogueOp对应的模式或FcOp对应的模式,从而达成双路径的选择。

新Op(GemmEpilogueOp)在elementwiseAdd的时候,bias支持两种规模[1,N] 和 [M, N]([M,N]是matmul的输出规模)。 新Op支持 paddle.add(paddle.matmul(x, w), y) 和 paddle.add(y, paddle.matmul(x, w))两种模式(add参数位置调换)。 新Op支持Relu和Gelu激活。

新Op和原来的FcOp共用FCInferMeta函数,我放宽了该函数的约束以匹配额外模式。也就是说FcOp不能处理的模式,目前只在pass的约束中过滤,在FCInferMeta中的check被取消了。

关于性能: GemmEpilogueOp与散op相比,在大模型上跑2batch的端到端测速: 在llama上有大概 2.0% 的提速 在chatglm2上有大概 8.5% 的提速

TODO: pass目前提供Relu和Gelu激活的融合,还有三种激活目前已在kernel里实现(处于注释状态),但尚未在pass里支持。在kernel层面,解注释即可使用。

YKTian-x2b avatar Feb 21 '24 08:02 YKTian-x2b

你的PR提交成功,感谢你对开源项目的贡献! 请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。 Your PR has been submitted. Thanks for your contribution! Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot[bot] avatar Feb 21 '24 08:02 paddle-bot[bot]

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Feb 21 '24 08:02 CLAassistant

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Feb 21 '24 08:02 CLAassistant

Sorry to inform you that f959b87's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

paddle-ci-bot[bot] avatar Mar 27 '24 03:03 paddle-ci-bot[bot]

Sorry to inform you that 519a02b's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

paddle-ci-bot[bot] avatar Apr 06 '24 03:04 paddle-ci-bot[bot]

PR描述中的一些"fc"字样也需要更新下,以及PR标题完善下~

yuanlehome avatar Apr 17 '24 05:04 yuanlehome

Sorry to inform you that 008e268's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

paddle-ci-bot[bot] avatar May 02 '24 03:05 paddle-ci-bot[bot]