BELLE icon indicating copy to clipboard operation
BELLE copied to clipboard

Required library version not found: libbitsandbytes_cuda100_nocublaslt.so.

Open yiyepiaoling0715 opened this issue 1 year ago • 1 comments

这个问题怎么解决, cuda版本的问题?


CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100
from .peft_model import ( /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. ================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. ================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! from .peft_model import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. ================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables.
from .lora import LoraConfig, LoraModelCUDA SETUP: Setup Failed!

File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. ================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. Traceback (most recent call last): File "src/train.py", line 25, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in from peft import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/init.py", line 22, in from .lora import LoraConfig, LoraModel File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in from .lora import LoraConfig, LoraModel File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/mapping.py", line 16, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from .peft_model import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in import bitsandbytes.functional as F File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in from .lora import LoraConfig, LoraModel
import bitsandbytes.functional as F File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in

File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues import bitsandbytes.functional as F File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/tiger/lite_sdk/cc/lib:/opt/tiger/ss_lib/so did not contain libcudart.so as expected! Searching further paths... warn(msg) /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//ml.bytedance.net')} warn(msg) https://github.com/TimDettmers/bitsandbytes/issues''')/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//reckon.bytedance.net')} warn(msg)

/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('http'), PosixPath('8118'), PosixPath('//sys-proxy-rd-relay.byted.org')} warn(msg) RuntimeError: /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/data00/yarn/logs')} warn(msg)

    CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
    If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
    https://github.com/TimDettmers/bitsandbytes/issues

/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//workspace-proxy-candy-lq-tce.byted.org/p35fd73f3c83a328043cc3eec1607af56/proxy/{{port}}')} warn(msg) /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/vscode-git-ff6b75c69f.sock')} warn(msg) /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//lf6-config.bytetcc.com/obj/tcc-config-web')} warn(msg) /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_pi_tcj64/none_ljcat5ei/attempt_0/4/error.json')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. ================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. ================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. ================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. ================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. import bitsandbytes.functional as F File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in Traceback (most recent call last): File "src/train.py", line 25, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in from peft import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/init.py", line 22, in https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/mapping.py", line 16, in from .peft_model import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in from .lora import LoraConfig, LoraModel File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in import bitsandbytes.functional as F File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/tiger/lite_sdk/cc/lib:/opt/tiger/ss_lib/so did not contain libcudart.so as expected! Searching further paths... warn(msg)

/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/data00/yarn/logs')} warn(msg) from .cextension import COMPILED_WITH_CUDA, lib /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//workspace-proxy-candy-lq-tce.byted.org/p35fd73f3c83a328043cc3eec1607af56/proxy/{{port}}')} warn(msg)

/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_pi_tcj64/none_ljcat5ei/attempt_0/0/error.json')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. ================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. ================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

  1. CUDA not installed

  2. You have multiple conflicting CUDA libraries/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/opt/tiger/toutiao/log/sys/stdout.log_')} warn(msg)

  3. Required library not pre-compiled for this bitsandbytes release! /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('byted.org,bytedance.net,.byted.org,.bytedance.net,localhost,.ecombdimg.com,.byteimg.com,127.0.0.1,'), PosixPath('/8,100.64.0.0/10,fe80'), PosixPath('/10,172.16.0.0/12,169.254.0.0/16,192.168.0.0/16'), PosixPath('1,10.0.0.0/8,127.0.0.0/8,fd00')} warn(msg) CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//workspace.byted.org'), PosixPath('https')} warn(msg) ================================================================================ /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/opt/tiger/tez_deploy/conf')} warn(msg)

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/data00/yarn/pid')} warn(msg) CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed!/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/opt/tiger/toutiao/log/sys')} warn(msg)

/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/lq_nas_workspace/63.82.57.63')} warn(msg) /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/opt/tiger/mlx_notebook_pysdk/mlx-pysdk'), PosixPath('/tmp/mlx/workspace'), PosixPath('/opt/tiger/forge_toolbox')} warn(msg) /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/vscode-git-ff6b75c69f.sock')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//lf6-config.bytetcc.com/obj/tcc-config-web')} warn(msg)

CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_pi_tcj64/none_ljcat5ei/attempt_0/3/error.json')} warn(msg)

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...CUDA SETUP: Detected CUDA version 100

CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source?

CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.soCUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

CUDA SETUP: Highest compute capability among GPUs detected: 7.0

CUDA SETUP: Detected CUDA version 100================================================ERROR=====================================

CUDA SETUP: CUDA detection failed! Possible reasons: /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg)

  1. CUDA driver not installedCUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source?

  2. CUDA not installedCUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

  3. You have multiple conflicting CUDA libraries

  4. Required library not pre-compiled for this bitsandbytes release!================================================ERROR=====================================

CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.CUDA SETUP: CUDA detection failed! Possible reasons:

CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.1. CUDA driver not installed

================================================================================2. CUDA not installed

  1. You have multiple conflicting CUDA libraries

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version.4. Required library not pre-compiled for this bitsandbytes release!

CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables.CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.

CUDA SETUP: Setup Failed!CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!from peft import ( CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...

CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/init.py", line 22, in CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so

CUDA SETUP: Highest compute capability among GPUs detected: 7.0CUDA SETUP: Highest compute capability among GPUs detected: 7.0

CUDA SETUP: Detected CUDA version 100CUDA SETUP: Detected CUDA version 100

CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source?CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source?

CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR=====================================================================================ERROR=====================================

CUDA SETUP: CUDA detection failed! Possible reasons:CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed1. CUDA driver not installed

  2. CUDA not installed2. CUDA not installed

  3. You have multiple conflicting CUDA libraries3. You have multiple conflicting CUDA libraries

  4. Required library not pre-compiled for this bitsandbytes release!4. Required library not pre-compiled for this bitsandbytes release!

CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.

CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

================================================================================================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version.CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version.

CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables.CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables.

CUDA SETUP: Setup Failed!CUDA SETUP: Setup Failed!

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. ================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so... Traceback (most recent call last):

================================================ERROR===================================== File "src/train.py", line 25, in

CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. ================================================================================ CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. Traceback (most recent call last): File "src/train.py", line 25, in from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/mapping.py", line 16, in from peft import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/init.py", line 22, in from .peft_model import (from peft import (

File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/init.py", line 22, in from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/mapping.py", line 16, in from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/mapping.py", line 16, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in from .peft_model import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in from .peft_model import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in from .lora import LoraConfig, LoraModel File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in from .lora import LoraConfig, LoraModel File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in from .lora import LoraConfig, LoraModel File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in import bitsandbytes.functional as F File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in import bitsandbytes.functional as F File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issuesimport bitsandbytes.functional as F

File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 649) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/local/bin/torchrun", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/torch/distributed/run.py", line 762, in main run(args) File "/usr/local/lib/python3.7/site-packages/torch/distributed/run.py", line 756, in run )(*cmd_args) File "/usr/local/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 248, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

src/train.py FAILED

Failures: [1]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 1 (local_rank: 1) exitcode : 1 (pid: 650) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 2 (local_rank: 2) exitcode : 1 (pid: 651) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [3]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 3 (local_rank: 3) exitcode : 1 (pid: 652) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [4]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 4 (local_rank: 4) exitcode : 1 (pid: 653) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [5]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 5 (local_rank: 5) exitcode : 1 (pid: 654) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [6]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 6 (local_rank: 6) exitcode : 1 (pid: 655) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [7]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 7 (local_rank: 7) exitcode : 1 (pid: 656) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 0 (local_rank: 0) exitcode : 1 (pid: 649) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

INFO[0072] Worker 0 Status Failed host="fdbd:dc03:1:337::43" message= reason=Error error: exec command: 0 ➜ train git:(main) ✗

yiyepiaoling0715 avatar May 19 '23 03:05 yiyepiaoling0715

这个问题怎么解决, cuda版本的问题?

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 from .peft_model import ( /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! from .peft_model import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. from .lora import LoraConfig, LoraModelCUDA SETUP: Setup Failed!

File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. Traceback (most recent call last): File "src/train.py", line 25, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in from peft import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/init.py", line 22, in from .lora import LoraConfig, LoraModel File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in from .lora import LoraConfig, LoraModel File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/mapping.py", line 16, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from .peft_model import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in import bitsandbytes.functional as F File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in from .lora import LoraConfig, LoraModel import bitsandbytes.functional as F File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in

File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues import bitsandbytes.functional as F File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/tiger/lite_sdk/cc/lib:/opt/tiger/ss_lib/so did not contain libcudart.so as expected! Searching further paths... warn(msg) /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//ml.bytedance.net')} warn(msg) https://github.com/TimDettmers/bitsandbytes/issues''')/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//reckon.bytedance.net')} warn(msg)

/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('http'), PosixPath('8118'), PosixPath('//sys-proxy-rd-relay.byted.org')} warn(msg) RuntimeError: /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/data00/yarn/logs')} warn(msg)

    CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
    If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
    https://github.com/TimDettmers/bitsandbytes/issues

/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//workspace-proxy-candy-lq-tce.byted.org/p35fd73f3c83a328043cc3eec1607af56/proxy/{{port}}')} warn(msg) /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/vscode-git-ff6b75c69f.sock')} warn(msg) /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//lf6-config.bytetcc.com/obj/tcc-config-web')} warn(msg) /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_pi_tcj64/none_ljcat5ei/attempt_0/4/error.json')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. import bitsandbytes.functional as F File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in Traceback (most recent call last): File "src/train.py", line 25, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in from peft import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/init.py", line 22, in https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/mapping.py", line 16, in from .peft_model import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in from .lora import LoraConfig, LoraModel File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in import bitsandbytes.functional as F File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/tiger/lite_sdk/cc/lib:/opt/tiger/ss_lib/so did not contain libcudart.so as expected! Searching further paths... warn(msg)

/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/data00/yarn/logs')} warn(msg) from .cextension import COMPILED_WITH_CUDA, lib /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//workspace-proxy-candy-lq-tce.byted.org/p35fd73f3c83a328043cc3eec1607af56/proxy/{{port}}')} warn(msg)

/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_pi_tcj64/none_ljcat5ei/attempt_0/0/error.json')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

  1. CUDA not installed
  2. You have multiple conflicting CUDA libraries/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/opt/tiger/toutiao/log/sys/stdout.log_')} warn(msg)
  3. Required library not pre-compiled for this bitsandbytes release! /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('byted.org,bytedance.net,.byted.org,.bytedance.net,localhost,.ecombdimg.com,.byteimg.com,127.0.0.1,'), PosixPath('/8,100.64.0.0/10,fe80'), PosixPath('/10,172.16.0.0/12,169.254.0.0/16,192.168.0.0/16'), PosixPath('1,10.0.0.0/8,127.0.0.0/8,fd00')} warn(msg) CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda. /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//workspace.byted.org'), PosixPath('https')} warn(msg)

    /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/opt/tiger/tez_deploy/conf')} warn(msg)

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/data00/yarn/pid')} warn(msg) CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed!/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/opt/tiger/toutiao/log/sys')} warn(msg)

/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/lq_nas_workspace/63.82.57.63')} warn(msg) /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/opt/tiger/mlx_notebook_pysdk/mlx-pysdk'), PosixPath('/tmp/mlx/workspace'), PosixPath('/opt/tiger/forge_toolbox')} warn(msg) /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/vscode-git-ff6b75c69f.sock')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//lf6-config.bytetcc.com/obj/tcc-config-web')} warn(msg)

CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0/usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_pi_tcj64/none_ljcat5ei/attempt_0/3/error.json')} warn(msg)

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...CUDA SETUP: Detected CUDA version 100

CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source?

CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.soCUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

CUDA SETUP: Highest compute capability among GPUs detected: 7.0

CUDA SETUP: Detected CUDA version 100================================================ERROR=====================================

CUDA SETUP: CUDA detection failed! Possible reasons: /usr/local/lib/python3.7/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg)

  1. CUDA driver not installedCUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source?
  2. CUDA not installedCUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release!================================================ERROR=====================================

CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.CUDA SETUP: CUDA detection failed! Possible reasons:

CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.1. CUDA driver not installed

================================================================================2. CUDA not installed

  1. You have multiple conflicting CUDA libraries

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version.4. Required library not pre-compiled for this bitsandbytes release!

CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables.CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.

CUDA SETUP: Setup Failed!CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!from peft import ( CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...

CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/init.py", line 22, in CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so

CUDA SETUP: Highest compute capability among GPUs detected: 7.0CUDA SETUP: Highest compute capability among GPUs detected: 7.0

CUDA SETUP: Detected CUDA version 100CUDA SETUP: Detected CUDA version 100

CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source?CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source?

CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR=====================================================================================ERROR=====================================

CUDA SETUP: CUDA detection failed! Possible reasons:CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed1. CUDA driver not installed
  2. CUDA not installed2. CUDA not installed
  3. You have multiple conflicting CUDA libraries3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release!4. Required library not pre-compiled for this bitsandbytes release!

CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.

CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

================================================================================================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version.CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version.

CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables.CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables.

CUDA SETUP: Setup Failed!CUDA SETUP: Setup Failed!

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!! CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 100 CUDA SETUP: Required library version not found: libbitsandbytes_cuda100_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so... Traceback (most recent call last):

================================================ERROR===================================== File "src/train.py", line 25, in

CUDA SETUP: CUDA detection failed! Possible reasons:

  1. CUDA driver not installed
  2. CUDA not installed
  3. You have multiple conflicting CUDA libraries
  4. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

    CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. CUDA SETUP: Setup Failed! CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version. CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables. Traceback (most recent call last): File "src/train.py", line 25, in from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/mapping.py", line 16, in from peft import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/init.py", line 22, in from .peft_model import (from peft import (

File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/init.py", line 22, in from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/mapping.py", line 16, in from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/mapping.py", line 16, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in from .peft_model import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in from .peft_model import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/peft_model.py", line 31, in from .lora import LoraConfig, LoraModel File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from .tuners import ( File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/init.py", line 21, in from .lora import LoraConfig, LoraModel File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in from .lora import LoraConfig, LoraModel File "/mlx_devbox/users/zhengyuyu/workspace/code/BELLE/train/src/peft/tuners/lora.py", line 40, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in import bitsandbytes as bnb File "/usr/local/lib/python3.7/site-packages/bitsandbytes/init.py", line 7, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in import bitsandbytes.functional as F File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in from .autograd._functions import ( File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/init.py", line 1, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in from ._functions import undo_layout, get_inverse_transform_indices File "/usr/local/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 9, in import bitsandbytes.functional as F File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issuesimport bitsandbytes.functional as F

File "/usr/local/lib/python3.7/site-packages/bitsandbytes/functional.py", line 17, in

from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in from .cextension import COMPILED_WITH_CUDA, lib File "/usr/local/lib/python3.7/site-packages/bitsandbytes/cextension.py", line 25, in https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues https://github.com/TimDettmers/bitsandbytes/issues''') RuntimeError: CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 649) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/local/bin/torchrun", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/torch/distributed/run.py", line 762, in main run(args) File "/usr/local/lib/python3.7/site-packages/torch/distributed/run.py", line 756, in run )(*cmd_args) File "/usr/local/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 248, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

src/train.py FAILED

Failures:

[1]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 1 (local_rank: 1) exitcode : 1 (pid: 650) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 2 (local_rank: 2) exitcode : 1 (pid: 651) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [3]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 3 (local_rank: 3) exitcode : 1 (pid: 652) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [4]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 4 (local_rank: 4) exitcode : 1 (pid: 653) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [5]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 5 (local_rank: 5) exitcode : 1 (pid: 654) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [6]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 6 (local_rank: 6) exitcode : 1 (pid: 655) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [7]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 7 (local_rank: 7) exitcode : 1 (pid: 656) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):

[0]: time : 2023-05-19_10:39:37 host : mlxlab4sumlnse6462ddf7-20230516013551-0elst2-xozv1d-worker rank : 0 (local_rank: 0) exitcode : 1 (pid: 649) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html INFO[0072] Worker 0 Status Failed host="fdbd:dc03:1:337::43" message= reason=Error error: exec command: 0 ➜ train git:(main) ✗

建议还是用提供的docker环境

xianghuisun avatar May 25 '23 04:05 xianghuisun