Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

报错如下：

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
  File "E:\PycharmProjects\ChatGLM\cli_demo.py", line 7, in <module>
    model = AutoModel.from_pretrained("E:\ChatGLM-6B\chatglm-6b-int4", trust_remote_code=True).half().cuda()
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\transformers\models\auto\auto_factory.py", line 466, in from_pretrained
    return model_class.from_pretrained(
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\transformers\modeling_utils.py", line 2498, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 1047, in __init__
    self.transformer = ChatGLMModel(config, empty_init=empty_init)
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 844, in __init__
    [get_layer(layer_id) for layer_id in range(self.num_layers)]
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 844, in <listcomp>
    [get_layer(layer_id) for layer_id in range(self.num_layers)]
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 829, in get_layer
    return GLMBlock(
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 598, in __init__
    self.mlp = GLU(
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 531, in __init__
    self.dense_4h_to_h = init_method(
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\utils\init.py", line 52, in skip_init
    return module_cls(*args, **kwargs).to_empty(device=final_device)
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\modules\module.py", line 1024, in to_empty
    return self._apply(lambda t: torch.empty_like(t, device=device))
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
    param_applied = fn(param)
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\modules\module.py", line 1024, in <lambda>
    return self._apply(lambda t: torch.empty_like(t, device=device))
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\_refs\__init__.py", line 4254, in empty_like
    return torch.empty_strided(

RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 134217728 bytes.

Expected Behavior

No response

Steps To Reproduce

# cli_demo.py
tokenizer = AutoTokenizer.from_pretrained("E:\ChatGLM-6B\chatglm-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("E:\ChatGLM-6B\chatglm-6b-int4", trust_remote_code=True).half().cuda()
model = model.eval()

Environment

- OS: Win 10
- Python: 3.10
- Transformers:
- PyTorch: 1.12.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : true

Anything else?

内存：16GB

GPU：7.9GB

虚拟内存：已设置为“System Manage”

Apr 15 '23 08:04 nor1take

遇到了同样的问题，同样也是16G内存

- OS: Win 11
- Python: 3.11
- Transformers:
- PyTorch: 2.0.0+cu118
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : true

Apr 26 '23 04:04 picasso250

。。。 i7-12700k + ROG 3080 12GB + 32G内存都跑不动

May 20 '23 14:05 mayflyfy

@mayflyfy 属实有点离谱了奥

May 20 '23 14:05 nor1take

@mayflyfy 属实有点离谱了奥

我这主机配下来1.8w，机器所有程序关完，重启机器， INT8模型刚刚能启动，开个浏览器就起不起来了，你能信。

赛博时代，没钱不配玩LLM（狗头）

May 20 '23 14:05 mayflyfy

ChatGLM-6B
ChatGLM-6B copied to clipboard

[Help] <DefaultCPUAllocator: not enough memory: you tried to allocate 134217728 bytes.>

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

内存：16GB

GPU：7.9GB

虚拟内存：已设置为“System Manage”

ChatGLM-6B ChatGLM-6B copied to clipboard

[Help] <DefaultCPUAllocator: not enough memory: you tried to allocate 134217728 bytes.>

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

内存：16GB

GPU：7.9GB

虚拟内存：已设置为“System Manage”

ChatGLM-6B
ChatGLM-6B copied to clipboard