ChatGLM-6B [Feature] m系列的mac启用gpu

Is your feature request related to a problem? Please describe.

mac下cpu运行非常慢 pytorch 在m系列的mac上可以支持GPU加速

Solutions

可以通过以下函数判断 torch.backends.mps.is_available()

修改 .half().cuda() 成 float().to("mps")

运行返回

有解决办法么？

Additional context

No response

Mar 19 '23 05:03 kingzeus

https://github.com/THUDM/ChatGLM-6B/issues/6#issuecomment-1474260291

int64 is supported on MacOS 13.3 Βeta, and you should also use the nightly build of pytorch.

I tried to use mps backend to run on gpu, but it seems to have a bug when calling the generate function.

Mar 19 '23 07:03 chaucerling

#6 (comment)

int64 is supported on MacOS 13.3 Βeta, and you should also use the nightly build of pytorch.

I tried to use mps backend to run on gpu, but it seems to have a bug when calling the generate function.

it seems to work! !

步骤：

修改 .half().cuda() 成 float().to("mps")
修改 modeling_chatglm.py line33-37

# flags required to enable jit fusion kernels
# torch._C._jit_set_profiling_mode(False)
# torch._C._jit_set_profiling_executor(False)
# torch._C._jit_override_can_fuse_on_cpu(True)
# torch._C._jit_override_can_fuse_on_gpu(True)

修改 modeling_chatglm.py line 268

 dtype = attention_scores.dtype

运行

目前会有些警告，但是似乎不影响使用 cpu模式下大概300s左右，经过以上修改，仅需5-8s左右即可，多1轮回答，内存大概增加2G左右

需要改进的地方：模型需要下载到本地，才能修改 modeling_chatglm.py。现有代码结构下，似乎没有很好的解决办法

Mar 19 '23 11:03 kingzeus

@kingzeus 您好请问您测试环境的pytorch版本和macOS版本分别是多少呢？

Mar 19 '23 12:03 imClumsyPanda

@kingzeus 您好请问您测试环境的pytorch版本和macOS版本分别是多少呢？

macOS 13.2.1 torch 2.0.0 torchaudio 2.0.0.dev20230313 torchvision 0.15.1

Mar 20 '23 06:03 kingzeus

@kingzeus 请问如何绕过 cpm_kernels 的 RuntimeError: Unknown platform: darwin 报错？

Mar 21 '23 22:03 LeeeSe

@kingzeus 请问如何绕过 cpm_kernels 的 RuntimeError: Unknown platform: darwin 报错？

目前来看，最简单的方法不要调用量化函数/不使用int4模型

Mar 22 '23 02:03 kingzeus

#6 (comment) int64 is supported on MacOS 13.3 Βeta, and you should also use the nightly build of pytorch. I tried to use mps backend to run on gpu, but it seems to have a bug when calling the generate function.

it seems to work! !

步骤：

修改 .half().cuda() 成 float().to("mps")

修改 modeling_chatglm.py line33-37
# flags required to enable jit fusion kernels
# torch._C._jit_set_profiling_mode(False)
# torch._C._jit_set_profiling_executor(False)
# torch._C._jit_override_can_fuse_on_cpu(True)
# torch._C._jit_override_can_fuse_on_gpu(True)
修改 modeling_chatglm.py line 268
 dtype = attention_scores.dtype
运行

目前会有些警告，但是似乎不影响使用 cpu模式下大概300s左右，经过以上修改，仅需5-8s左右即可，多1轮回答，内存大概增加2G左右

需要改进的地方：模型需要下载到本地，才能修改 modeling_chatglm.py。现有代码结构下，似乎没有很好的解决办法

@kingzeus 感谢你提供的方法。我们已经修改了HF hub上的 modeling_chatglm.py，现在可以直接运行。另外将.float()改为.half()可以节省内存。

Mar 23 '23 14:03 duzx16

但是直接运行 python web_demo.py 就会遇到这个错，请问怎么避免？

@kingzeus 请问如何绕过 cpm_kernels 的 RuntimeError: Unknown platform: darwin 报错？

目前来看，最简单的方法不要调用量化函数/不使用int4模型

Mar 27 '23 09:03 tedyyu

ChatGLM-6B ChatGLM-6B copied to clipboard

[Feature] m系列的mac启用gpu

Is your feature request related to a problem? Please describe.

Solutions

Additional context

ChatGLM-6B
ChatGLM-6B copied to clipboard