about run glm4 demo error
osοΌubuntu20.04 cudaοΌ11.8 torchοΌ2.1.0 nvidia driver versionοΌ470 transformersοΌ4.40.0 accelerateοΌ0.31.0
when I run glm4 codeοΌI got error like thisοΌI dont believe whether it is low for driver version οΌ
key codeοΌ
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(
MODEL_PATH,
trust_remote_code=True,
encode_special_tokens=True
)
model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
low_cpu_mem_usage=True,
trust_remote_code=True,
load_in_4bit=True
).to(device).eval()
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards: 0%| | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/zl/GLM-4/basic_demo/quane_model.py", line 18, in <module>
model = AutoModelForCausalLM.from_pretrained(
File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3677, in from_pretrained
) = cls._load_pretrained_model(
File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4104, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 886, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 400, in set_module_tensor_to_device
new_value = value.to(device)
RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Your driver version is indeed very low, if possible you could try updating it and checking if that helps. But in your case, it might be something else, as the error message suggests that the GPU is occupied. Can you check if something else obstructing the GPU when you run the code?
when I exit my docker containerοΌit can load models, same error , but it doesn't seem like an acceleration issue
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 10/10 [00:00<00:00, 15.13it/s]
Traceback (most recent call last):
File "/home/zl/GLM-4/basic_demo/trans_cli_demo.py", line 53, in TORCH_USE_CUDA_DSA to enable device-side assertions.
I don't think it's an issue with docker, but rather that another process is occupying your GPU and that's why PyTorch cannot use it properly. At least this is what the error message is suggesting. I would expect the same error to occur without accelerate.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.