ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

[Help] <DefaultCPUAllocator: not enough memory: you tried to allocate 134217728 bytes.>

Open nor1take opened this issue 2 years ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

报错如下:

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
  File "E:\PycharmProjects\ChatGLM\cli_demo.py", line 7, in <module>
    model = AutoModel.from_pretrained("E:\ChatGLM-6B\chatglm-6b-int4", trust_remote_code=True).half().cuda()
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\transformers\models\auto\auto_factory.py", line 466, in from_pretrained
    return model_class.from_pretrained(
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\transformers\modeling_utils.py", line 2498, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 1047, in __init__
    self.transformer = ChatGLMModel(config, empty_init=empty_init)
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 844, in __init__
    [get_layer(layer_id) for layer_id in range(self.num_layers)]
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 844, in <listcomp>
    [get_layer(layer_id) for layer_id in range(self.num_layers)]
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 829, in get_layer
    return GLMBlock(
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 598, in __init__
    self.mlp = GLU(
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 531, in __init__
    self.dense_4h_to_h = init_method(
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\utils\init.py", line 52, in skip_init
    return module_cls(*args, **kwargs).to_empty(device=final_device)
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\modules\module.py", line 1024, in to_empty
    return self._apply(lambda t: torch.empty_like(t, device=device))
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
    param_applied = fn(param)
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\modules\module.py", line 1024, in <lambda>
    return self._apply(lambda t: torch.empty_like(t, device=device))
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\_refs\__init__.py", line 4254, in empty_like
    return torch.empty_strided(
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 134217728 bytes.

Expected Behavior

No response

Steps To Reproduce

# cli_demo.py
tokenizer = AutoTokenizer.from_pretrained("E:\ChatGLM-6B\chatglm-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("E:\ChatGLM-6B\chatglm-6b-int4", trust_remote_code=True).half().cuda()
model = model.eval()

Environment

- OS: Win 10
- Python: 3.10
- Transformers:
- PyTorch: 1.12.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : true

Anything else?

内存:16GB

image

GPU:7.9GB

image

虚拟内存:已设置为“System Manage”

image

nor1take avatar Apr 15 '23 08:04 nor1take

遇到了同样的问题,同样也是16G内存

- OS: Win 11
- Python: 3.11
- Transformers:
- PyTorch: 2.0.0+cu118
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : true

picasso250 avatar Apr 26 '23 04:04 picasso250

。。。 i7-12700k + ROG 3080 12GB + 32G内存都跑不动

image

mayflyfy avatar May 20 '23 14:05 mayflyfy

@mayflyfy 属实有点离谱了奥

nor1take avatar May 20 '23 14:05 nor1take

@mayflyfy 属实有点离谱了奥

我这主机配下来1.8w,机器所有程序关完,重启机器, INT8模型刚刚能启动,开个浏览器就起不起来了,你能信。

赛博时代,没钱不配玩LLM(狗头)

mayflyfy avatar May 20 '23 14:05 mayflyfy