MOSS issues

推荐使用 AutoGPTQ 或 AutoGPTQ-triton 进行模型量化和推理

13

我在 issues 中看到很多同学在使用量化版本的模型推理时遇到问题，在这里想自吹自擂地推荐一下自己的新项目，希望能够对大家有所帮助。 [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) 是一个简单易用的模型量化和推理工具包，提供类 `Hugging Face transformers` 风格的函数接口并与其 `TextGenerationPipeline` 完全兼容。目前支持四大主流开源 GPT 模型家族的量化和推理，并能够通过少数几行代码的编写，快速拓展到其他 GPT 模型的量化及推理应用。此外，项目中还提供了多个预定义的任务类型，能够方便使用者快速评估模型在下游具体任务上的表现。 [AutoGPTQ-triton](https://github.com/qwopqwop200/AutoGPTQ-triton) 是 [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) 项目所有者在 AutoGPTQ 的基础上拓展的 triton 推理版本。对于拥有能够使用 triton 的操作系统的同学，推荐使用 AutoGPTQ-triton 以获得更快的推理体验，而如果无法使用 triton 的话，AutoGPTQ...

PanQiWei

RTX3080+WSL成功运行模型量化部分代码，错误总结如下。

6

1、首先量化模型用到triton库，所以只能在linux环境推理，我用的是wsl，装了ubuntu22.04。直接pip install triton也会报错，默认安装的版本带了post1后缀，运行后面的代码也会报错，所以安装时指定triton版本pip install triton==2.0.0 。 2、解决triton问题之后，运行如果出现python.h相关问题的话，需要执行安装sudo apt-get install python3.8-dev 。 3、另外试过更改moss_cli_demo.py 和oss_gui_demo.py的模型引用地址，都会报错，错误为缺少index.json文件，非量化模型是有不同的checkpoint的，所以文件夹会比量化模型多一个索引文件，这个问题待解决。把目前记得的问题先记录下来，如果有相同配置的朋友出现其他问题，可以留言交流。

SitaraJin

--deepspeed_multinode_launcher: 未找到命令

6

按照官方教程进行训练，启动训练脚本时提示--deepspeed_multinode_launcher: 未找到命令以下时启动脚本配置： num_machines=4 num_processes=$((num_machines * 8)) machine_rank=0 accelerate launch \ --config_file ./configs/sft.yaml \ --num_processes $num_processes \ --num_machines $num_machines \ --machine_rank $machine_rank \ --deepspeed_multinode_launcher standard finetune_moss.py \ --model_name_or_path /root/liuliu/moss/moss-moon-003-sft-plugin \...

168liuliu168

部署报错提示 module 'torch' has no attribute 'float32'

3

严格按照 README 教程操作的，环境是 ananconda + python3.8，显卡是 A100 80G 版依赖也是仓库中 requirements.txt 中的版本，但是运行到第三步的时候报错 ```python Python 3.8.16 | packaged by conda-forge | (default, Feb 1 2023, 16:01:55) [GCC 11.3.0] on linux Type...

zengxs

help wanted

INT8量化版本在设置加载显卡时运行报错:Triton Error [CUDA]: an illegal memory access was encountered

2

我在多卡机器上部署moss-moon-003-sft-int8模型，运行下面指令是： ``` model = AutoModelForCausalLM.from_pretrained("fnlp/moss-moon-003-sft-int8", trust_remote_code=True).half().cuda(1) inputs="blabla" for k in inputs: inputs[k] = inputs[k].cuda(1) ``` 在 `outputs = model.generate(**inputs, do_sample=True, temperature=0.7, top_p=0.8, repetition_penalty=1.02, max_new_tokens=256)` 处报错**Triton Error [CUDA]: an illegal memory...

Ratuchetp

RuntimeError: CUDA out of memory

5

安装完成第一次运行时报错，ubuntu2204/nvidia T4卡x2 。是要切换量化等级吗？ $python moss_cli_demo.py Fetching 17 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ImGoodBai

中国下一个chatgpt在那里

chengong2020

使用fnlp/moss-moon-003-sft-int4和fnlp/moss-moon-003-sft-int8模型报错

5

因为GPU显存32G，所以修改了文件moss_gui_demo.py 中的模型引用，改为fnlp/moss-moon-003-sft-int4和fnlp/moss-moon-003-sft-int8后都报错。 ~/MOSS$ python moss_gui_demo.py Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 10.8G/10.8G [04:29

ImGoodBai