ChatGLM3 chatglm3-6b-32k处理长文本时，out of memory

chatglm3-6b-32k处理长文本时，out of memory

Open SXxinxiaosong opened this issue 1 year ago • 1 comments

System Info / 系統信息

cuda 11.7 transformes 4.37.2 python3.10

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

[ ] The official example scripts / 官方的示例脚本
[X] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

1.GPU为 DGX-A800-80G 2.export CUDA_VISIBLE_DEVICES=1,2 3.

tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=True)
model = AutoModel.from_pretrained(model_path,trust_remote_code=True,device_map="auto",torch_dtype=torch.float16)
query="balabala"
ids = tokenizer.encode(query, add_special_tokens=True)
print(len(ids)) #长度大约为30k
input_ids = torch.LongTensor([ids])
model.eval()
generated_ids = model.generate(
            input_ids=input_ids,
            max_new_tokens=16,
            # min_new_tokens=len(target_new_id),
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id,
        )

generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(input_ids, generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

显示torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.18 GiB (GPU 0; 79.35 GiB total capacity; 46.59 GiB already allocated; 11.25 GiB free; 66.82 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Expected behavior / 期待表现

输入长度为30k左右，已经用两张卡加载模型了，还是会显示out-of-memory 请求一下帮助~谢谢~

Jul 21 '24 12:07 SXxinxiaosong

长度越长，占用的显存越多，你两张卡一共多少呢

Sep 02 '24 08:09 zRzRzRzRzRzRzR

ChatGLM3 ChatGLM3 copied to clipboard

chatglm3-6b-32k处理长文本时，out of memory

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

ChatGLM3
ChatGLM3 copied to clipboard