os: ubuntu 16.04
transformer: 4.37.0
cuda: torch2.0.1+cu118
GPU:8*V100 32G
运行代码报错,像是爆显存了,好像每个gpu只能使用20G是为什么呢
代码:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
"/media/sdc/zhulei/Qwen1.5-72B-Chat",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("/media/sdc/zhulei/Qwen1.5-72B-Chat")
prompt = "帮忙写个c++用线程池处理问题的demo"
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
报错:
Traceback (most recent call last):
File "/media/sdc/NLP/Qwen/qwen1_5_infer.py", line 24, in
generated_ids = model.generate(
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/transformers/generation/utils.py", line 1520, in generate
return self.sample(
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/transformers/generation/utils.py", line 2617, in sample
outputs = self(
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1173, in forward
outputs = self.model(
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1058, in forward
layer_outputs = decoder_layer(
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 773, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 260, in forward
query_states = self.q_proj(hidden_states)
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/anaconda3/envs/Qwen_1.5/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)
No idea, and btw bf16 is not supported in V100