Paddle
Paddle copied to clipboard
Optimize performance of depthwise_conv_bwd
PR types
Performance optimization
PR changes
OPs
Describe
Optimize performance of depthwise_conv_bwd for input
- Method:
- Reduce modulo calculations and other redundant calculations
- Modify the config of block/grid
- Result:
| config | pytorch | paddle dev | paddle this PR | speedup |
|---|---|---|---|---|
| input[2048, 1024, 4, 4] filter[1024, 1, 4, 4] stride=1 pad=0 dilation=1 |
1.1070ms | 2.9660ms | 1.0798ms | 2.75x |
你的PR提交成功,感谢你对开源项目的贡献! 请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。 Your PR has been submitted. Thanks for your contribution! Please wait for the result of CI firstly. See Paddle CI Manual for details.