Paddle icon indicating copy to clipboard operation
Paddle copied to clipboard

[DCU]Fix build error on rocm5.7

Open onepick opened this issue 1 year ago • 9 comments

PR Category

Others

PR Types

Bug fixes

Description

fix build error and warning on rocm5.7

  1. rocm5以上版本,头文件引用规则变化,要求从$ROCM_PATH/include进行引用
  2. add patch to third_patry warprnnt/warpctc to fix build error
  3. rocm5.7使用clang17, 编译器检测更加严格,编译rocm时,去掉Werror

onepick avatar Apr 22 '24 07:04 onepick

你的PR提交成功,感谢你对开源项目的贡献! 请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。 Your PR has been submitted. Thanks for your contribution! Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot[bot] avatar Apr 22 '24 07:04 paddle-bot[bot]

@luotao1 关于CI中没有通过的部分,都是在编译cuda的是时候,报出: nvcc fatal : Value '-Xcompiler' is not defined for option 'Werror'

但是我本地使用nvidia/cuda:11.6.1-cudnn8-devel-ubuntu20.04镜像编译的时候,并没有报出这个错误来,请问你们的环境有什么不一样的地方吗?

onepick avatar Apr 23 '24 07:04 onepick

@qili93 有什么更新吗?

onepick avatar Apr 25 '24 02:04 onepick

@onepick 当前代码的修改无法通过PR-CI-BUILD的CI,建议先修复下现有的CUDA编译问题,保障代码兼容性,谢谢!

qili93 avatar Apr 29 '24 07:04 qili93

nvidia/cuda:11.6.1-cudnn8-devel-ubuntu20.04

看了下日志里面显示的CUDA版本是

2024-04-29 15:51:37 ****************************************
2024-04-29 15:51:37 Paddle version: N/A
2024-04-29 15:51:37 Paddle With CUDA: N/A
2024-04-29 15:51:37 
2024-04-29 15:51:37 OS: ubuntu 20.04
2024-04-29 15:51:37 GCC version: (GCC) 8.2.0
2024-04-29 15:51:37 Clang version: N/A
2024-04-29 15:51:37 CMake version: version 3.18.0
2024-04-29 15:51:37 Libc version: glibc 2.31
2024-04-29 15:51:37 Python version: 3.10.13
2024-04-29 15:51:37 
2024-04-29 15:51:37 CUDA version: 11.8.89
2024-04-29 15:51:37 Build cuda_11.8.r11.8/compiler.31833905_0
2024-04-29 15:51:37 cuDNN version: 8.9.0
2024-04-29 15:51:37 Nvidia driver version: N/A
2024-04-29 15:51:37 Nvidia driver List: N/A
2024-04-29 15:51:37 ****************************************
2024-04-29 15:51:37 + bash /paddle/tools/get_cpu_info.sh

建议可以试试看 registry.baidubce.com/paddlepaddle/paddle:latest-dev-cuda11.8-cudnn8.6-trt8.5-gcc82 的镜像本地编译看下。也辛苦 @risemeup1 确认下 https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/10589314/job/26072763 这流水线使用的基础编译镜像是什么。

qili93 avatar Apr 29 '24 07:04 qili93

Sorry to inform you that 475d533's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

paddle-ci-bot[bot] avatar May 01 '24 03:05 paddle-ci-bot[bot]

@qili93 fail的CI部分没任何错误提示?

onepick avatar May 06 '24 02:05 onepick

2024-05-02 09:52:08 **************** 2024-05-02 09:52:08 0. Change compilation flag of warnings is not recommended. You must have one RD's (zhiqiu (Recommend), luotao1 or phlrain or Aurelius84) approval to use these methods. 2024-05-02 09:52:08 There are 1 approved errors. 2024-05-02 09:52:08 ****************

onecatcn avatar May 06 '24 06:05 onecatcn

2024-05-02 09:52:08 **************** 2024-05-02 09:52:08 0. Change compilation flag of warnings is not recommended. You must have one RD's (zhiqiu (Recommend), luotao1 or phlrain or Aurelius84) approval to use these methods. 2024-05-02 09:52:08 There are 1 approved errors. 2024-05-02 09:52:08 ****************

got it. @zhiqiu @luotao1 编译中的warning过多,主要是因为rocm5.7升级编译器到clang17, clang17的检测更加智能/更加严格,所以更多的warning暴露了出来

onepick avatar May 06 '24 07:05 onepick