inference icon indicating copy to clipboard operation
inference copied to clipboard

GLM-4V 模型显存使用量计算bug

Open Jalen-Zhong opened this issue 1 year ago • 8 comments

System Info / 系統信息

Ubuntu18.04 python==3.10

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • [ ] docker / docker
  • [X] pip install / 通过 pip install 安装
  • [ ] installation from source / 从源码安装

Version info / 版本信息

xinference==0.13.3

The command used to start Xinference / 用以启动 xinference 的命令

XINFERENCE_MODEL_SRC=modelscope xinference cal-model-mem -s 9 -f pytorch -c 8192 -n glm-4v

Reproduction / 复现过程

  1. 输入cmd: XINFERENCE_MODEL_SRC=modelscope xinference cal-model-mem -s 9 -f pytorch -c 8192 -n glm-4v 2.cmd输出: Traceback (most recent call last): File "/root/anaconda3/envs/glm-4v-x/bin/xinference", line 8, in <module> sys.exit(cli()) File "/root/anaconda3/envs/glm-4v-x/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/root/anaconda3/envs/glm-4v-x/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/root/anaconda3/envs/glm-4v-x/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/root/anaconda3/envs/glm-4v-x/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/root/anaconda3/envs/glm-4v-x/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/root/anaconda3/envs/glm-4v-x/lib/python3.10/site-packages/xinference/deploy/cmdline.py", line 1561, in cal_model_mem mem_info = estimate_llm_gpu_memory( File "/root/anaconda3/envs/glm-4v-x/lib/python3.10/site-packages/xinference/model/llm/memory.py", line 102, in estimate_llm_gpu_memory info = get_model_layers_info( File "/root/anaconda3/envs/glm-4v-x/lib/python3.10/site-packages/xinference/model/llm/memory.py", line 227, in get_model_layers_info return load_model_config_json(config_path) File "/root/anaconda3/envs/glm-4v-x/lib/python3.10/site-packages/xinference/model/llm/memory.py", line 186, in load_model_config_json vocab_size=int(_load_item_from_json(config_data, "vocab_size")), File "/root/anaconda3/envs/glm-4v-x/lib/python3.10/site-packages/xinference/model/llm/memory.py", line 179, in _load_item_from_json raise ValueError("load ModelLayersInfo: missing %s" % (keys[0])) ValueError: load ModelLayersInfo: missing vocab_size

Expected behavior / 期待表现

修复glm-4v显存计算问题。另外,–quantization {precision}参数也有问题,建议一并查改。

Jalen-Zhong avatar Jul 29 '24 09:07 Jalen-Zhong

Thanks, @frostyplanet do you have time to look at this issue?

qinxuye avatar Jul 29 '24 09:07 qinxuye

@Jalen-Zhong 这个 vl 模型的config.json 格式和其他模型不同,这个好办。但 vl 模型的计算原理有不同,所以按原算法算不准的,有没有原理的文章能看看

frostyplanet avatar Jul 29 '24 10:07 frostyplanet

0.12.0的含金量还在上升,我通过版本降级解决了 参考#1712

Yog-AI avatar Aug 01 '24 06:08 Yog-AI

0.12.0的含金量还在上升,我通过版本降级解决了 参考#1712

这个问题和你碰到的不是一个问题。

qinxuye avatar Aug 01 '24 07:08 qinxuye

xinference cal-model-mem -n glm-4v -s 9 -f pytorch -c 4096

pytorch 版本 glm4v/glm4-chat 的 config.json 和别的模型不一样, 缺一些字段:

  • vocab_size 找到替代 padded_vocab_size
  • intermediate_size 没找到替代

暂时搁置

frostyplanet avatar Aug 15 '24 03:08 frostyplanet

ggufv2 版本也下载不到 config.json

$ env XINFERENCE_MODEL_SRC=modelscope xinference cal-model-mem -n glm4-chat -s 9 -f ggufv2 -c 4096
modelscope.hub.errors.NotExistError: The file path: config.json not exist in: LLM-Research/glm-4-9b-chat-GGUF


$ env XINFERENCE_MODEL_SRC=huggingface HF_ENDPOINT="https://hf-mirror.com" xinference cal-model-mem -n glm4-chat -s 9 -f ggufv2 -c 4096

Entry Not Found for url: https://hf-mirror.com/legraphista/glm-4-9b-chat-GGUF/resolve/main/config.json.

frostyplanet avatar Aug 15 '24 04:08 frostyplanet

同xinference, version 0.15.2 codegeex4 xinference cal-model-mem -n codegeex4 -s 9 -f pytorch -c 2048 ValueError: load ModelLayersInfo: missing intermediate_size

xinference cal-model-mem -n glm4-chat -s 9 -f pytorch -c 4000 -q 4-bit ValueError: load ModelLayersInfo: missing vocab_size

Justin-12138 avatar Sep 24 '24 08:09 Justin-12138

calc-model-mem 目前不能支持 GLM 系列的模型,他们和 llama 系的不太一样。

qinxuye avatar Sep 25 '24 08:09 qinxuye

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] avatar Feb 13 '25 19:02 github-actions[bot]