ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

使用huggingface + cahtglm6b单机多卡推理加速有更好的办法吗?

Open LivinLuo1993 opened this issue 1 year ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

实测主页util.py中的 from utils import load_model_on_gpus model = load_model_on_gpus("THUDM/chatglm-6b", num_gpus=4) 效果与https://huggingface.co/docs/transformers/perf_infer_gpu_one中 max_memory_mapping = {0: "30GB", 1: "30GB", 2: "30GB", 3: "30GB"} model = AutoModel.from_pretrained( "THUDM/chatglm-6b", trust_remote_code=True, device_map='auto', max_memory=max_memory_mapping ).half()

推理速度几乎一模一样 image

Expected Behavior

No response

Steps To Reproduce

None

Environment

None

Anything else?

No response

LivinLuo1993 avatar Jun 21 '23 10:06 LivinLuo1993

不知道官方是否能适配到如

BrightXiaoHan avatar Jun 22 '23 03:06 BrightXiaoHan

请问脚本如何单机多卡推理?

VictoryBlue avatar Jun 23 '23 03:06 VictoryBlue

该脚本的作用是在多张小显存的显卡上使用,不是用多张显卡加速

duzx16 avatar Jun 25 '23 08:06 duzx16

@duzx16 意思是一个模型一张显存放不下,只好放在多张显存上嘛?请问您是否知道如何在chatglm上使用数据并行训练模型呢?

VictoryBlue avatar Jun 25 '23 08:06 VictoryBlue

请问您有遇到过这个问题吗:Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

xiaojidaner avatar Jul 05 '23 07:07 xiaojidaner

我之前加载模型的代码是:model = AutoModel.from_pretrained(model_path, trust_remote_code=True, device_map='auto').half().cuda() 提示的问题和你一样,后来我将.cuda()去掉后,这么问题就解决了。你参考一下。

yingzhang0709 avatar Aug 24 '23 08:08 yingzhang0709