ChatGLM-6B
ChatGLM-6B copied to clipboard
使用huggingface + cahtglm6b单机多卡推理加速有更好的办法吗?
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
实测主页util.py中的
from utils import load_model_on_gpus
model = load_model_on_gpus("THUDM/chatglm-6b", num_gpus=4)
效果与https://huggingface.co/docs/transformers/perf_infer_gpu_one中
max_memory_mapping = {0: "30GB", 1: "30GB", 2: "30GB", 3: "30GB"}
model = AutoModel.from_pretrained(
"THUDM/chatglm-6b",
trust_remote_code=True,
device_map='auto',
max_memory=max_memory_mapping
).half()
推理速度几乎一模一样
Expected Behavior
No response
Steps To Reproduce
None
Environment
None
Anything else?
No response
请问脚本如何单机多卡推理?
该脚本的作用是在多张小显存的显卡上使用,不是用多张显卡加速
@duzx16 意思是一个模型一张显存放不下,只好放在多张显存上嘛?请问您是否知道如何在chatglm上使用数据并行训练模型呢?
请问您有遇到过这个问题吗:Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
我之前加载模型的代码是:model = AutoModel.from_pretrained(model_path, trust_remote_code=True, device_map='auto').half().cuda() 提示的问题和你一样,后来我将.cuda()去掉后,这么问题就解决了。你参考一下。