MOSS 使用moss-moon-003-sft-int8量化模型后会报错 is not a folder containing a `.index.json`

整体文件上也查看了，确实是没有xxx.index.json文件。 https://huggingface.co/fnlp/moss-moon-003-sft-int8/tree/main

标准的未量化的。https://huggingface.co/fnlp/moss-moon-003-sft/tree/main内包含有index.json文件。

Apr 24 '23 07:04 meishild

量化版本初始化代码和未量化的好像不一样

Apr 24 '23 07:04 cnsky2016

同问

Apr 24 '23 07:04 genghaojie123

同问

Apr 24 '23 07:04 triumph

同问

Apr 24 '23 07:04 mapledxf

同问，fnlp/moss-moon-003-sft下目前也没带index.json

Apr 24 '23 08:04 07freedom

同问，是版本问题，还是没测试过就发的吗😁

Apr 24 '23 09:04 Copilot-X

同问

Apr 24 '23 10:04 liushiwei123

从readme看，量化版本和非量化版本，初始化代码不太一样，使用未量化版本代码直接跑量化模型，提示这个错误

Apr 24 '23 10:04 cnsky2016

同样遇到了，怎么解决呢大佬们？

Apr 24 '23 12:04 AreChen

尝试这样更新主函数 model = MossForCausalLM.from_pretrained("fnlp/moss-moon-003-sft-int4").half().cuda() infer = Inference(model, device_map="auto")

Apr 24 '23 13:04 Hzfinfdu

尝试这样更新主函数 model = MossForCausalLM.from_pretrained("fnlp/moss-moon-003-sft-int4").half().cuda() infer = Inference(model, device_map="auto")

我用这个方案更新在了moss_inference.py下，直接显示Killed，没有其他报错 fnlp/moss-moon-003-sft-int8，显卡4090 24G

Apr 24 '23 15:04 07freedom

同问

Apr 25 '23 02:04 yangYJT

尝试这样更新主函数 model = MossForCausalLM.from_pretrained("fnlp/moss-moon-003-sft-int4").half().cuda() infer = Inference(model, device_map="auto")

我用这个方案更新在了moss_inference.py下，直接显示Killed，没有其他报错 fnlp/moss-moon-003-sft-int8，显卡4090 24G

查一下内存是否足够, 内存不够显示 Killed

Apr 25 '23 06:04 erhuli

量化算法对内存要求较高

Apr 25 '23 06:04 Hzfinfdu

尝试这样更新主函数 model = MossForCausalLM.from_pretrained("fnlp/moss-moon-003-sft-int4").half().cuda() infer = Inference(model, device_map="auto")

我用这个方案更新在了moss_inference.py下，直接显示Killed，没有其他报错 fnlp/moss-moon-003-sft-int8，显卡4090 24G

查一下内存是否足够, 内存不够显示 Killed

Apr 25 '23 06:04 Kywaldos

3080ti的话是不是int4都跑不了,内存有64G,我把上面大佬提供的代码加在moss_inference.py下会提示OOM

Traceback (most recent call last):
  File "moss_inference.py", line 352, in <module>
    model = MossForCausalLM.from_pretrained("fnlp/moss-moon-003-sft-plugin-int4").half().cuda()
  File "/home/arechen/.conda/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 680, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/arechen/.conda/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply
    module._apply(fn)
  File "/home/arechen/.conda/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 593, in _apply
    param_applied = fn(param)
  File "/home/arechen/.conda/envs/moss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 680, in <lambda>
    return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA out of memory. Tried to allocate 1.22 GiB (GPU 0; 11.76 GiB total capacity; 8.82 GiB already allocated; 383.94 MiB free; 8.85 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Apr 25 '23 06:04 AreChen

尝试这样更新主函数 model = MossForCausalLM.from_pretrained("fnlp/moss-moon-003-sft-int4").half().cuda() infer = Inference(model, device_map="auto")

我用这个方案更新在了moss_inference.py下，直接显示Killed，没有其他报错 fnlp/moss-moon-003-sft-int8，显卡4090 24G

查一下内存是否足够, 内存不够显示 Killed

这里的“足够”有预估吗，训练和推理的需求大概各是多少？我查了下当前内存16G确实比较小

Apr 25 '23 07:04 07freedom

尝试这样更新主函数 model = MossForCausalLM.from_pretrained("fnlp/moss-moon-003-sft-int4").half().cuda() infer = Inference(model, device_map="auto")

我用这个方案更新在了moss_inference.py下，直接显示Killed，没有其他报错 fnlp/moss-moon-003-sft-int8，显卡4090 24G

查一下内存是否足够, 内存不够显示 Killed

这里的“足够”有预估吗，训练和推理的需求大概各是多少？我查了下当前内存16G确实比较小

项目中没见有说明, 建议项目组成员补充一下. 32G应该是够的

Apr 25 '23 07:04 erhuli

我修改的是moss_gui_demo.py,替换model加载语句的方法还是有效的，可以参考我的操作记录：https://blog.csdn.net/genghaojie123/article/details/130357804

Apr 25 '23 07:04 genghaojie123

尝试这样更新主函数 model = MossForCausalLM.from_pretrained("fnlp/moss-moon-003-sft-int4").half().cuda() infer = Inference(model, device_map="auto")

我用这个方案更新在了moss_inference.py下，直接显示Killed，没有其他报错 fnlp/moss-moon-003-sft-int8，显卡4090 24G

查一下内存是否足够, 内存不够显示 Killed

这里的“足够”有预估吗，训练和推理的需求大概各是多少？我查了下当前内存16G确实比较小

项目中没见有说明, 建议项目组成员补充一下. 32G应该是够的

1、如果用wsl，默认只会占用系统一半的内存，需要修改配置文件。 2、实测32g内存能开起来，但是一旦有信息输入模型就会报Segmentation fault，感觉还是内存的问题。

Apr 25 '23 08:04 silicon14

有人解决了吗

Apr 25 '23 08:04 HDRBgg

尝试这样更新主函数 model = MossForCausalLM.from_pretrained("fnlp/moss-moon-003-sft-int4").half().cuda() infer = Inference(model, device_map="auto")

我用这个方案更新在了moss_inference.py下，直接显示Killed，没有其他报错 fnlp/moss-moon-003-sft-int8，显卡4090 24G

查一下内存是否足够, 内存不够显示 Killed

这里的“足够”有预估吗，训练和推理的需求大概各是多少？我查了下当前内存16G确实比较小

项目中没见有说明, 建议项目组成员补充一下. 32G应该是够的

1、如果用wsl，默认只会占用系统一半的内存，需要修改配置文件。 2、实测32g内存能开起来，但是一旦有信息输入模型就会报Segmentation fault，感觉还是内存的问题。

glm-6B的话16G内存就能起了，难道为了他得搞个64G内存？

Apr 25 '23 13:04 07freedom

@meta-tabchen 你的 moss_web_demo_gradio.py 是不是没有适配量化版本

Apr 26 '23 07:04 linonetwo

@meta-tabchen 你的 moss_web_demo_gradio.py 是不是没有适配量化版本

当时开发时还没有量化版本的模型，我这边先看下哈

Apr 26 '23 08:04 meta-tabchen

@meta-tabchen 可以参考我这个，我修好了在 docker 里用 https://github.com/linonetwo/MOSS-DockerFile/blob/master/moss_web_demo_gradio.py ，不过为了方便 docker 里用我也加了点别的东西

Apr 26 '23 11:04 linonetwo

MOSS MOSS copied to clipboard

使用moss-moon-003-sft-int8量化模型后会报错 is not a folder containing a `.index.json`

MOSS
MOSS copied to clipboard