accelerate about run glm4 demo error

os：ubuntu20.04 cuda：11.8 torch：2.1.0 nvidia driver version：470 transformers：4.40.0 accelerate：0.31.0

when I run glm4 code，I got error like this，I dont believe whether it is low for driver version ：

key code：

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(
    MODEL_PATH,
    trust_remote_code=True,
    encode_special_tokens=True
)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    load_in_4bit=True
).to(device).eval()

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards:   0%|                                                                                                                | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/zl/GLM-4/basic_demo/quane_model.py", line 18, in <module>
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
    return model_class.from_pretrained(
  File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3677, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4104, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 886, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 400, in set_module_tensor_to_device
    new_value = value.to(device)
RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Jun 27 '24 13:06 leizhu1989

Your driver version is indeed very low, if possible you could try updating it and checking if that helps. But in your case, it might be something else, as the error message suggests that the GPU is occupied. Can you check if something else obstructing the GPU when you run the code?

Jun 27 '24 13:06 BenjaminBossan

when I exit my docker container，it can load models, same error , but it doesn't seem like an acceleration issue

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 15.13it/s] Traceback (most recent call last): File "/home/zl/GLM-4/basic_demo/trans_cli_demo.py", line 53, in device_map="auto").eval().to(device) File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2692, in to return super().to(*args, **kwargs) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Jun 28 '24 02:06 leizhu1989

I don't think it's an issue with docker, but rather that another process is occupying your GPU and that's why PyTorch cannot use it properly. At least this is what the error message is suggesting. I would expect the same error to occur without accelerate.

Jun 28 '24 10:06 BenjaminBossan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jul 27 '24 15:07 github-actions[bot]