rtp-llm icon indicating copy to clipboard operation
rtp-llm copied to clipboard

最新代码源码编译报错

Open leyiwang opened this issue 10 months ago • 5 comments

看到最近的提交,rtp-llm支持cosyvoice。尝试源码编译,执行以下指令时报错,想请教下应该怎么编译源码?

bazelisk build //maga_transformer:maga_transformer --jobs 100 --verbose_failures --config=cuda12_2

异常输出如下:

025/02/24 03:42:08 Downloading https://releases.bazel.build/6.4.0/release/bazel-6.4.0-linux-x86_64...
Downloading: 52 MB out of 52 MB (100%) 
INFO: Repository local_config_cuda instantiated at:
  /home/wangleyi/rtp-llm/WORKSPACE:7:15: in <toplevel>
Repository rule cuda_configure defined at:
  /home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl:1427:33: in <toplevel>
ERROR: An error occurred during the fetch of repository 'local_config_cuda':
   Traceback (most recent call last):
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 1396, column 38, in _cuda_autoconf_impl
                _create_local_cuda_repository(repository_ctx)
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 959, column 35, in _create_local_cuda_repository
                cuda_config = _get_cuda_config(repository_ctx, find_cuda_config_script)
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 647, column 30, in _get_cuda_config
                config = find_cuda_config(repository_ctx, find_cuda_config_script, ["cuda", "cudnn", "nccl"])
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 624, column 26, in find_cuda_config
                exec_result = execute(repository_ctx, [python_bin, repository_ctx.attr._find_cuda_config] + cuda_libraries)
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/common.bzl", line 217, column 13, in execute
                fail(
Error in fail: Repository command failed
Could not find any nccl.h matching version '2' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
        'local/cuda/extras/CUPTI/include'
of:
        '/usr/local/cuda/'
ERROR: /home/wangleyi/rtp-llm/WORKSPACE:7:15: fetching cuda_configure rule //external:local_config_cuda: Traceback (most recent call last):
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 1396, column 38, in _cuda_autoconf_impl
                _create_local_cuda_repository(repository_ctx)
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 959, column 35, in _create_local_cuda_repository
                cuda_config = _get_cuda_config(repository_ctx, find_cuda_config_script)
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 647, column 30, in _get_cuda_config
                config = find_cuda_config(repository_ctx, find_cuda_config_script, ["cuda", "cudnn", "nccl"])
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 624, column 26, in find_cuda_config
                exec_result = execute(repository_ctx, [python_bin, repository_ctx.attr._find_cuda_config] + cuda_libraries)
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/common.bzl", line 217, column 13, in execute
                fail(
Error in fail: Repository command failed
Could not find any nccl.h matching version '2' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
        'local/cuda/extras/CUPTI/include'
of:
        '/usr/local/cuda/'
ERROR: Analysis of target '//maga_transformer:maga_transformer' failed; build aborted: no such package '@local_config_cuda//cuda': Repository command failed
Could not find any nccl.h matching version '2' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
        'local/cuda/extras/CUPTI/include'
of:
        '/usr/local/cuda/'
INFO: Elapsed time: 0.141s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets configured)
    currently loading: 

leyiwang avatar Feb 24 '25 03:02 leyiwang

cuda12_2 已弃用,改用 cuda12 或者 cuda12_6

Vinkle-hzt avatar Feb 25 '25 11:02 Vinkle-hzt

cuda12_2 已弃用,改用 cuda12 或者 cuda12_6 之前conda环境是通过以下指令初始化的python环境、并进行源码编译

pip3 install -r open_source/deps/requirements_torch_gpu_cuda12.txt
bazelisk build //maga_transformer:maga_transformer --jobs 100 --verbose_failures --config=cuda12_2

修改编译指令cuda12,执行以下指令

bazelisk build //maga_transformer:maga_transformer --jobs 100 --verbose_failures --config=cuda12

报错如下:

INFO: Build options --action_env and --host_action_env have changed, discarding analysis cache.
INFO: Repository local_config_cuda instantiated at:
  /home/wangleyi/rtp-llm/WORKSPACE:7:15: in <toplevel>
Repository rule cuda_configure defined at:
  /home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl:1427:33: in <toplevel>
ERROR: An error occurred during the fetch of repository 'local_config_cuda':
   Traceback (most recent call last):
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 1396, column 38, in _cuda_autoconf_impl
                _create_local_cuda_repository(repository_ctx)
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 959, column 35, in _create_local_cuda_repository
                cuda_config = _get_cuda_config(repository_ctx, find_cuda_config_script)
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 647, column 30, in _get_cuda_config
                config = find_cuda_config(repository_ctx, find_cuda_config_script, ["cuda", "cudnn", "nccl"])
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 624, column 26, in find_cuda_config
                exec_result = execute(repository_ctx, [python_bin, repository_ctx.attr._find_cuda_config] + cuda_libraries)
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/common.bzl", line 217, column 13, in execute
                fail(
Error in fail: Repository command failed
Could not find any cuda.h matching version '12.4' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
        'local/cuda/extras/CUPTI/include'
of:
        '/usr/local/cuda/'
ERROR: /home/wangleyi/rtp-llm/WORKSPACE:7:15: fetching cuda_configure rule //external:local_config_cuda: Traceback (most recent call last):
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 1396, column 38, in _cuda_autoconf_impl
                _create_local_cuda_repository(repository_ctx)
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 959, column 35, in _create_local_cuda_repository
                cuda_config = _get_cuda_config(repository_ctx, find_cuda_config_script)
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 647, column 30, in _get_cuda_config
                config = find_cuda_config(repository_ctx, find_cuda_config_script, ["cuda", "cudnn", "nccl"])
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/cuda_configure.bzl", line 624, column 26, in find_cuda_config
                exec_result = execute(repository_ctx, [python_bin, repository_ctx.attr._find_cuda_config] + cuda_libraries)
        File "/home/wangleyi/rtp-llm/3rdparty/cuda_config/common.bzl", line 217, column 13, in execute
                fail(
Error in fail: Repository command failed
Could not find any cuda.h matching version '12.4' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
        'local/cuda/extras/CUPTI/include'
of:
        '/usr/local/cuda/'
ERROR: Analysis of target '//maga_transformer:maga_transformer' failed; build aborted: no such package '@local_config_cuda//cuda': Repository command failed
Could not find any cuda.h matching version '12.4' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
        'local/cuda/extras/CUPTI/include'
of:
        '/usr/local/cuda/'
INFO: Elapsed time: 0.166s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (4 packages loaded, 6 targets configured)
    currently loading: 

想请教下,这种情况是需要重新安装cuda吗,有没有可以编译的镜像环境?😂

leyiwang avatar Feb 25 '25 12:02 leyiwang

嗯 cuda12 对应 12.4 cuda12_6 对应 12.6

Vinkle-hzt avatar Feb 25 '25 12:02 Vinkle-hzt

嗯 cuda12 对应 12.4 cuda12_6 对应 12.6

想用cuda 12.6 源码编译,应该怎么搞

Liaukx avatar Feb 26 '25 09:02 Liaukx

可以用类似下面的软链接解决这个问题 ln -s /usr/include/nccl.h /usr/local/cuda/include/nccl.h ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2.19.3 /usr/local/cuda/lib64/libnccl.so.2.19.3 ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/lib64/libnccl.so.2

Z-NAVY avatar Mar 11 '25 08:03 Z-NAVY