ChatGLM3
ChatGLM3 copied to clipboard
chatglm3-6b-32k处理长文本时,out of memory
System Info / 系統信息
cuda 11.7 transformes 4.37.2 python3.10
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
- [ ] The official example scripts / 官方的示例脚本
- [X] My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
1.GPU为 DGX-A800-80G 2.export CUDA_VISIBLE_DEVICES=1,2 3.
tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=True)
model = AutoModel.from_pretrained(model_path,trust_remote_code=True,device_map="auto",torch_dtype=torch.float16)
query="balabala"
ids = tokenizer.encode(query, add_special_tokens=True)
print(len(ids)) #长度大约为30k
input_ids = torch.LongTensor([ids])
model.eval()
generated_ids = model.generate(
input_ids=input_ids,
max_new_tokens=16,
# min_new_tokens=len(target_new_id),
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(input_ids, generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
显示torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.18 GiB (GPU 0; 79.35 GiB total capacity; 46.59 GiB already allocated; 11.25 GiB free; 66.82 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Expected behavior / 期待表现
输入长度为30k左右,已经用两张卡加载模型了,还是会显示out-of-memory 请求一下帮助~谢谢~
长度越长,占用的显存越多,你两张卡一共多少呢