Zhengxiao Du comments

Results 163 comments of


                                            Zhengxiao Du

[Feature] 先使用CPU进行模型量化，再将量化后模型拷贝至GPU

> > 这个需求可以直接用量化后的模型 https://huggingface.co/THUDM/chatglm-6b-int4 和 https://huggingface.co/THUDM/chatglm-6b-int8 不过量化之后在 GPU 上推理也是需要用 CUDA写的 kernel的，我觉得可能无法成功。要解决这个问题还是要把 CUDA kernel 移植到 ROCm > > 并不需要用 CUDA写的 kernel，我将chatglm移植到directml为后端的gpu推理，是可以成功运行的。当然，有privateuse1的问题，这个可以用从源码编译torch解决好的，如果有完成的实现的话可以加到友情链接里

使用huggingface + cahtglm6b单机多卡推理加速有更好的办法吗？

该脚本的作用是在多张小显存的显卡上使用，不是用多张显卡加速

[BUG/Help] <title>LLVM ERROR: Failed to infer result type(s).

You need to install PyTorch-nightly to use MPS backend. PyTorch 2.0 is not enough.

[BUG] change num_beams form 1 to 2, give RuntimeError: probability tensor contains either inf, nan or element < 0

Cannot reproduce this. Are you using chatglm-6b or the quantized version?

[BUG/Help] <keyerror>

Are you sure you are using `transformers 4.28.1`?

[BUG/Help] <title>

请参考 https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning#%E6%A8%A1%E5%9E%8B%E9%83%A8%E7%BD%B2 的方式部署当前提供的信息太少，无法提供进一步帮助

[BUG/Help] <title>

> > ``` > > import os > import platform > import signal > from transformers import AutoTokenizer, AutoModel, AutoConfig > import torch > tokenizer = AutoTokenizer.from_pretrained("ptuning/output/adgen-chatglm-6b-pt-128-2e-2/checkpoint-1000", trust_remote_code=True) > config...

[BUG/Help] <title>

> import os import torch from transformers import AutoConfig, AutoModel, AutoTokenizer > > CHECKPOINT_PATH = "./output/adgen-chatglm-6b-pt-8-1e-2-dev/checkpoint-3000" tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True) config = AutoConfig.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, pre_seq_len=128) model = AutoModel.from_pretrained("THUDM/chatglm-6b", config=config, trust_remote_code=True)...

[BUG/Help] <title>

> > > > checkpoint是通过train.sh生成的，怎么区分旧版和新版文件大小，在3GB以上的是旧版的

[Feature] 是否能够提供rest api

仓库里的`api.py`提供了一个简单的API实现，可以根据自己的需要修改。