ChatGLM-6B
ChatGLM-6B copied to clipboard
[Help] <DefaultCPUAllocator: not enough memory: you tried to allocate 134217728 bytes.>
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
报错如下:
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
File "E:\PycharmProjects\ChatGLM\cli_demo.py", line 7, in <module>
model = AutoModel.from_pretrained("E:\ChatGLM-6B\chatglm-6b-int4", trust_remote_code=True).half().cuda()
File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\transformers\models\auto\auto_factory.py", line 466, in from_pretrained
return model_class.from_pretrained(
File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\transformers\modeling_utils.py", line 2498, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 1047, in __init__
self.transformer = ChatGLMModel(config, empty_init=empty_init)
File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 844, in __init__
[get_layer(layer_id) for layer_id in range(self.num_layers)]
File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 844, in <listcomp>
[get_layer(layer_id) for layer_id in range(self.num_layers)]
File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 829, in get_layer
return GLMBlock(
File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 598, in __init__
self.mlp = GLU(
File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 531, in __init__
self.dense_4h_to_h = init_method(
File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\utils\init.py", line 52, in skip_init
return module_cls(*args, **kwargs).to_empty(device=final_device)
File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\modules\module.py", line 1024, in to_empty
return self._apply(lambda t: torch.empty_like(t, device=device))
File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
param_applied = fn(param)
File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\modules\module.py", line 1024, in <lambda>
return self._apply(lambda t: torch.empty_like(t, device=device))
File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\_refs\__init__.py", line 4254, in empty_like
return torch.empty_strided(
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 134217728 bytes.
Expected Behavior
No response
Steps To Reproduce
# cli_demo.py
tokenizer = AutoTokenizer.from_pretrained("E:\ChatGLM-6B\chatglm-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("E:\ChatGLM-6B\chatglm-6b-int4", trust_remote_code=True).half().cuda()
model = model.eval()
Environment
- OS: Win 10
- Python: 3.10
- Transformers:
- PyTorch: 1.12.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : true
Anything else?
内存:16GB

GPU:7.9GB

虚拟内存:已设置为“System Manage”

遇到了同样的问题,同样也是16G内存
- OS: Win 11
- Python: 3.11
- Transformers:
- PyTorch: 2.0.0+cu118
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : true
。。。 i7-12700k + ROG 3080 12GB + 32G内存都跑不动
@mayflyfy 属实有点离谱了奥
@mayflyfy 属实有点离谱了奥
我这主机配下来1.8w,机器所有程序关完,重启机器, INT8模型刚刚能启动,开个浏览器就起不起来了,你能信。
赛博时代,没钱不配玩LLM(狗头)