ktransformers icon indicating copy to clipboard operation
ktransformers copied to clipboard

convert_gpu_weights.py crashed by CUDA out of memory, even with --force_cpu

Open defWorldBetter opened this issue 1 month ago โ€ข 2 comments

Reminder

  • [x] I have read the above rules and searched the existing issues.

System Info

Intel(R) Xeon(R) Platinum 8461V + 3090 24G + 384G mem

Reproduction

๐ŸŽฏ Starting one-shot quantization...
2025-11-21T01:12:06.210574+0800 | reset | INFO - Compression lifecycle reset
2025-11-21T01:12:06.365277+0800 | _create_default_logger | INFO - Logging all LLM Compressor modifier-level logs to sparse_logs/21-11-2025_01.12.06.log
2025-11-21T01:12:06.365710+0800 | from_modifiers | INFO - Creating recipe from modifiers
2025-11-21T01:12:16.006866+0800 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
2025-11-21T01:12:16.006997+0800 | IndependentPipeline | INFO - Inferred `SequentialPipeline` for `GPTQModifier`
Preparing cache: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1024/1024 [00:02<00:00, 471.54it/s]
(1/93): Calibrating: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1024/1024 [00:11<00:00, 88.18it/s]
(1/93): Propagating: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1024/1024 [00:14<00:00, 68.51it/s]
(2/93): Calibrating: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1024/1024 [00:11<00:00, 87.54it/s]
(2/93): Propagating: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1024/1024 [00:14<00:00, 72.97it/s]
(3/93): Calibrating: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1024/1024 [00:11<00:00, 86.87it/s]
(3/93): Propagating: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1024/1024 [00:12<00:00, 84.10it/s]
(4/93): Calibrating:   0%|                                                                                                                                                                              | 0/1024 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/pipelines/sequential/helpers.py", line 73, in forward
    outputs = forward_fn(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 5, in forward
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/transformers/modeling_layers.py", line 94, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/transformers/models/glm4_moe/modeling_glm4_moe.py", line 395, in forward
    hidden_states = self.mlp(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/transformers/models/glm4_moe/modeling_glm4_moe.py", line 345, in forward
    hidden_states = self.moe(hidden_states, topk_indices, topk_weights).view(*orig_shape)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/transformers/models/glm4_moe/modeling_glm4_moe.py", line 331, in moe
    expert_output = expert(expert_input)
                    ^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/transformers/models/glm4_moe/modeling_glm4_moe.py", line 223, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
                                                                ^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1879, in _call_impl
    return inner()
           ^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1840, in inner
    hook_result = hook(self, args, result)
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/modifiers/utils/hooks.py", line 93, in wrapped_hook
    return hook(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/modifiers/quantization/gptq/base.py", line 230, in calibrate_module
    self._hessians[module] = make_empty_hessian(module, device=init_device)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/modifiers/quantization/gptq/gptq_quantize.py", line 30, in make_empty_hessian
    return torch.zeros((num_columns, num_columns), device=device, dtype=GPTQ_PRECISION)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB. GPU 0 has a total capacity of 23.57 GiB of which 65.19 MiB is free. Process 3964 has 254.00 MiB memory in use. Including non-PyTorch memory, this process has 23.23 GiB memory in use. Of the allocated memory 22.33 GiB is allocated by PyTorch, and 607.47 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/work/ktransformers/ktransformers/kt-kernel/scripts/convert_gpu_weights.py", line 376, in <module>
    main()
  File "/work/ktransformers/ktransformers/kt-kernel/scripts/convert_gpu_weights.py", line 360, in main
    oneshot(
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/entrypoints/oneshot.py", line 330, in oneshot
    one_shot()
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/entrypoints/oneshot.py", line 158, in __call__
    self.apply_recipe_modifiers(
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/entrypoints/oneshot.py", line 201, in apply_recipe_modifiers
    pipeline(
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/pipelines/independent/pipeline.py", line 45, in __call__
    pipeline(model, dataloader, dataset_args)
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/pipelines/sequential/pipeline.py", line 104, in __call__
    subgraph.forward(model, **inputs)
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/pipelines/sequential/helpers.py", line 75, in forward
    raise RuntimeError(
RuntimeError: Raised an exception during execution of the following code:

1 2 3 4 def forward(self, model_layers_2, model_rotary_emb, wrapped_5, getitem_3, getitem_1): 5 model_layers_3 = getattr(self.model.layers, "3")(model_layers_2, attention_mask = wrapped_5, position_ids = getitem_3, past_key_values = None, cache_position = getitem_1, position_embeddings = model_rotary_emb); model_layers_2 = wrapped_5 = getitem_3 = getitem_1 = model_rotary_emb = None 6 return {'model_layers_3': model_layers_3} 7

Others

command: python scripts/convert_gpu_weights.py --model_id /media/data/models/GLM-4.6/ --output_dir /models/ZhipuAI/GLM-4.6-GPTQ8 --force_cpu --trust_remote_code --max_sequence_length 1024 --num_calibration_samples 1024 --quant_type W4A16

I saw that there is "# Force all modules to CPU for quantization if args.force_cpu:" , Does this mean enabling this parameter will make the quantization process use only memory and be unrelated to GPU memory? Otherwise, if there is enough GPU memory to loading full weight, I would't need convert as so.

defWorldBetter avatar Nov 20 '25 17:11 defWorldBetter

I also encountered this bug, and by setting CUDA_VISIBLE_DEVICES="". I make the literal "force cpu". I think the config option for the scripts is not right and the scripts doc needs to be updated with more details. @ovowei @qiyuxinlin

KMSorSMS avatar Nov 22 '25 13:11 KMSorSMS

@ovowei ้ชŒ่ฏๆญคPRๆœช่งฃๅ†ณๆญค้—ฎ้ข˜๏ผŒไธ่ฎบๆ˜ฏ้…็ฝฎ--max_gpu_memoryๅ‚ๆ•ฐ่ฟ˜ๆ˜ฏ--force_cpu๏ผŒไปๆŠฅ CUDA out of memory๏ผš

root@hao-Super-Server:/work/ktransformers/ktransformers/kt-kernel# python scripts/convert_gpu_weights.py --model_id /media/data/models/GLM-4.6/ --output_dir /models/ZhipuAI/GLM-4.6-GPTQ4 --trust_remote_code --force_cpu --quant_type W4A16
๐Ÿ”ง Forced CPU-only mode
๐Ÿš€ Starting quantization process
   Model: /media/data/models/GLM-4.6/
   Output: /models/ZhipuAI/GLM-4.6-GPTQ4
   Quantization: W4A16
   Calibration samples: 512
   Max sequence length: 2048
๐Ÿ” Checking model configuration for dense layers...
โœ… Found dense layers configuration: first_k_dense_replace = 3
   Adding first 3 layers to ignore list...
   Dense layer pattern added: re:model\.layers\.[0-2]\.mlp\..*$
   This will ignore MLP components in layers 0-2
๐Ÿ” Building CPU-only device map...
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 93/93 [00:06<00:00, 14.58it/s]
Some weights of the model checkpoint at /media/data/models/GLM-4.6/ were not used when initializing Glm4MoeForCausalLM: ['model.layers.92.eh_proj.weight', 'model.layers.92.enorm.weight', 'model.layers.92.hnorm.weight', 'model.layers.92.input_layernorm.weight', 'model.layers.92.mlp.experts.0.down_proj.weight', 'model.layers.92.mlp.experts.0.gate_proj.weight', 'model.layers.92.mlp.experts.0.up_proj.weight', 'model.layers.92.mlp.experts.1.down_proj.weight', 'model.layers.92.mlp.experts.1.gate_proj.weight', 'model.layers.92.mlp.experts.1.up_proj.weight', 'model.layers.92.mlp.experts.10.down_proj.weight', 'model.layers.92.mlp.experts.10.gate_proj.weight', 'model.layers.92.mlp.experts.10.up_proj.weight', 'model.layers.92.mlp.experts.100.down_proj.weight', 'model.layers.92.mlp.experts.100.gate_proj.weight', 'model.layers.92.mlp.experts.100.up_proj.weight', 'model.layers.92.mlp.experts.101.down_proj.weight', 'model.layers.92.mlp.experts.101.gate_proj.weight', 'model.layers.92.mlp.experts.101.up_proj.weight', 'model.layers.92.mlp.experts.102.down_proj.weight', 'model.layers.92.mlp.experts.102.gate_proj.weight', 'model.layers.92.mlp.experts.102.up_proj.weight', 'model.layers.92.mlp.experts.103.down_proj.weight', 'model.layers.92.mlp.experts.103.gate_proj.weight', 'model.layers.92.mlp.experts.103.up_proj.weight', 'model.layers.92.mlp.experts.104.down_proj.weight', 'model.layers.92.mlp.experts.104.gate_proj.weight', 'model.layers.92.mlp.experts.104.up_proj.weight', 'model.layers.92.mlp.experts.105.down_proj.weight', 'model.layers.92.mlp.experts.105.gate_proj.weight', 'model.layers.92.mlp.experts.105.up_proj.weight', 'model.layers.92.mlp.experts.106.down_proj.weight', 'model.layers.92.mlp.experts.106.gate_proj.weight', 'model.layers.92.mlp.experts.106.up_proj.weight', 'model.layers.92.mlp.experts.107.down_proj.weight', 'model.layers.92.mlp.experts.107.gate_proj.weight', 'model.layers.92.mlp.experts.107.up_proj.weight', 'model.layers.92.mlp.experts.108.down_proj.weight', 'model.layers.92.mlp.experts.108.gate_proj.weight', 'model.layers.92.mlp.experts.108.up_proj.weight', 'model.layers.92.mlp.experts.109.down_proj.weight', 'model.layers.92.mlp.experts.109.gate_proj.weight', 'model.layers.92.mlp.experts.109.up_proj.weight', 'model.layers.92.mlp.experts.11.down_proj.weight', 'model.layers.92.mlp.experts.11.gate_proj.weight', 'model.layers.92.mlp.experts.11.up_proj.weight', 'model.layers.92.mlp.experts.110.down_proj.weight', 'model.layers.92.mlp.experts.110.gate_proj.weight', 'model.layers.92.mlp.experts.110.up_proj.weight', 'model.layers.92.mlp.experts.111.down_proj.weight', 'model.layers.92.mlp.experts.111.gate_proj.weight', 'model.layers.92.mlp.experts.111.up_proj.weight', 'model.layers.92.mlp.experts.112.down_proj.weight', 'model.layers.92.mlp.experts.112.gate_proj.weight', 'model.layers.92.mlp.experts.112.up_proj.weight', 'model.layers.92.mlp.experts.113.down_proj.weight', 'model.layers.92.mlp.experts.113.gate_proj.weight', 'model.layers.92.mlp.experts.113.up_proj.weight', 'model.layers.92.mlp.experts.114.down_proj.weight', 'model.layers.92.mlp.experts.114.gate_proj.weight', 'model.layers.92.mlp.experts.114.up_proj.weight', 'model.layers.92.mlp.experts.115.down_proj.weight', 'model.layers.92.mlp.experts.115.gate_proj.weight', 'model.layers.92.mlp.experts.115.up_proj.weight', 'model.layers.92.mlp.experts.116.down_proj.weight', 'model.layers.92.mlp.experts.116.gate_proj.weight', 'model.layers.92.mlp.experts.116.up_proj.weight', 'model.layers.92.mlp.experts.117.down_proj.weight', 'model.layers.92.mlp.experts.117.gate_proj.weight', 'model.layers.92.mlp.experts.117.up_proj.weight', 'model.layers.92.mlp.experts.118.down_proj.weight', 'model.layers.92.mlp.experts.118.gate_proj.weight', 'model.layers.92.mlp.experts.118.up_proj.weight', 'model.layers.92.mlp.experts.119.down_proj.weight', 'model.layers.92.mlp.experts.119.gate_proj.weight', 'model.layers.92.mlp.experts.119.up_proj.weight', 'model.layers.92.mlp.experts.12.down_proj.weight', 'model.layers.92.mlp.experts.12.gate_proj.weight', 'model.layers.92.mlp.experts.12.up_proj.weight', 'model.layers.92.mlp.experts.120.down_proj.weight', 'model.layers.92.mlp.experts.120.gate_proj.weight', 'model.layers.92.mlp.experts.120.up_proj.weight', 'model.layers.92.mlp.experts.121.down_proj.weight', 'model.layers.92.mlp.experts.121.gate_proj.weight', 'model.layers.92.mlp.experts.121.up_proj.weight', 'model.layers.92.mlp.experts.122.down_proj.weight', 'model.layers.92.mlp.experts.122.gate_proj.weight', 'model.layers.92.mlp.experts.122.up_proj.weight', 'model.layers.92.mlp.experts.123.down_proj.weight', 'model.layers.92.mlp.experts.123.gate_proj.weight', 'model.layers.92.mlp.experts.123.up_proj.weight', 'model.layers.92.mlp.experts.124.down_proj.weight', 'model.layers.92.mlp.experts.124.gate_proj.weight', 'model.layers.92.mlp.experts.124.up_proj.weight', 'model.layers.92.mlp.experts.125.down_proj.weight', 'model.layers.92.mlp.experts.125.gate_proj.weight', 'model.layers.92.mlp.experts.125.up_proj.weight', 'model.layers.92.mlp.experts.126.down_proj.weight', 'model.layers.92.mlp.experts.126.gate_proj.weight', 'model.layers.92.mlp.experts.126.up_proj.weight', 'model.layers.92.mlp.experts.127.down_proj.weight', 'model.layers.92.mlp.experts.127.gate_proj.weight', 'model.layers.92.mlp.experts.127.up_proj.weight', 'model.layers.92.mlp.experts.128.down_proj.weight', 'model.layers.92.mlp.experts.128.gate_proj.weight', 'model.layers.92.mlp.experts.128.up_proj.weight', 'model.layers.92.mlp.experts.129.down_proj.weight', 'model.layers.92.mlp.experts.129.gate_proj.weight', 'model.layers.92.mlp.experts.129.up_proj.weight', 'model.layers.92.mlp.experts.13.down_proj.weight', 'model.layers.92.mlp.experts.13.gate_proj.weight', 'model.layers.92.mlp.experts.13.up_proj.weight', 'model.layers.92.mlp.experts.130.down_proj.weight', 'model.layers.92.mlp.experts.130.gate_proj.weight', 'model.layers.92.mlp.experts.130.up_proj.weight', 'model.layers.92.mlp.experts.131.down_proj.weight', 'model.layers.92.mlp.experts.131.gate_proj.weight', 'model.layers.92.mlp.experts.131.up_proj.weight', 'model.layers.92.mlp.experts.132.down_proj.weight', 'model.layers.92.mlp.experts.132.gate_proj.weight', 'model.layers.92.mlp.experts.132.up_proj.weight', 'model.layers.92.mlp.experts.133.down_proj.weight', 'model.layers.92.mlp.experts.133.gate_proj.weight', 'model.layers.92.mlp.experts.133.up_proj.weight', 'model.layers.92.mlp.experts.134.down_proj.weight', 'model.layers.92.mlp.experts.134.gate_proj.weight', 'model.layers.92.mlp.experts.134.up_proj.weight', 'model.layers.92.mlp.experts.135.down_proj.weight', 'model.layers.92.mlp.experts.135.gate_proj.weight', 'model.layers.92.mlp.experts.135.up_proj.weight', 'model.layers.92.mlp.experts.136.down_proj.weight', 'model.layers.92.mlp.experts.136.gate_proj.weight', 'model.layers.92.mlp.experts.136.up_proj.weight', 'model.layers.92.mlp.experts.137.down_proj.weight', 'model.layers.92.mlp.experts.137.gate_proj.weight', 'model.layers.92.mlp.experts.137.up_proj.weight', 'model.layers.92.mlp.experts.138.down_proj.weight', 'model.layers.92.mlp.experts.138.gate_proj.weight', 'model.layers.92.mlp.experts.138.up_proj.weight', 'model.layers.92.mlp.experts.139.down_proj.weight', 'model.layers.92.mlp.experts.139.gate_proj.weight', 'model.layers.92.mlp.experts.139.up_proj.weight', 'model.layers.92.mlp.experts.14.down_proj.weight', 'model.layers.92.mlp.experts.14.gate_proj.weight', 'model.layers.92.mlp.experts.14.up_proj.weight', 'model.layers.92.mlp.experts.140.down_proj.weight', 'model.layers.92.mlp.experts.140.gate_proj.weight', 'model.layers.92.mlp.experts.140.up_proj.weight', 'model.layers.92.mlp.experts.141.down_proj.weight', 'model.layers.92.mlp.experts.141.gate_proj.weight', 'model.layers.92.mlp.experts.141.up_proj.weight', 'model.layers.92.mlp.experts.142.down_proj.weight', 'model.layers.92.mlp.experts.142.gate_proj.weight', 'model.layers.92.mlp.experts.142.up_proj.weight', 'model.layers.92.mlp.experts.143.down_proj.weight', 'model.layers.92.mlp.experts.143.gate_proj.weight', 'model.layers.92.mlp.experts.143.up_proj.weight', 'model.layers.92.mlp.experts.144.down_proj.weight', 'model.layers.92.mlp.experts.144.gate_proj.weight', 'model.layers.92.mlp.experts.144.up_proj.weight', 'model.layers.92.mlp.experts.145.down_proj.weight', 'model.layers.92.mlp.experts.145.gate_proj.weight', 'model.layers.92.mlp.experts.145.up_proj.weight', 'model.layers.92.mlp.experts.146.down_proj.weight', 'model.layers.92.mlp.experts.146.gate_proj.weight', 'model.layers.92.mlp.experts.146.up_proj.weight', 'model.layers.92.mlp.experts.147.down_proj.weight', 'model.layers.92.mlp.experts.147.gate_proj.weight', 'model.layers.92.mlp.experts.147.up_proj.weight', 'model.layers.92.mlp.experts.148.down_proj.weight', 'model.layers.92.mlp.experts.148.gate_proj.weight', 'model.layers.92.mlp.experts.148.up_proj.weight', 'model.layers.92.mlp.experts.149.down_proj.weight', 'model.layers.92.mlp.experts.149.gate_proj.weight', 'model.layers.92.mlp.experts.149.up_proj.weight', 'model.layers.92.mlp.experts.15.down_proj.weight', 'model.layers.92.mlp.experts.15.gate_proj.weight', 'model.layers.92.mlp.experts.15.up_proj.weight', 'model.layers.92.mlp.experts.150.down_proj.weight', 'model.layers.92.mlp.experts.150.gate_proj.weight', 'model.layers.92.mlp.experts.150.up_proj.weight', 'model.layers.92.mlp.experts.151.down_proj.weight', 'model.layers.92.mlp.experts.151.gate_proj.weight', 'model.layers.92.mlp.experts.151.up_proj.weight', 'model.layers.92.mlp.experts.152.down_proj.weight', 'model.layers.92.mlp.experts.152.gate_proj.weight', 'model.layers.92.mlp.experts.152.up_proj.weight', 'model.layers.92.mlp.experts.153.down_proj.weight', 'model.layers.92.mlp.experts.153.gate_proj.weight', 'model.layers.92.mlp.experts.153.up_proj.weight', 'model.layers.92.mlp.experts.154.down_proj.weight', 'model.layers.92.mlp.experts.154.gate_proj.weight', 'model.layers.92.mlp.experts.154.up_proj.weight', 'model.layers.92.mlp.experts.155.down_proj.weight', 'model.layers.92.mlp.experts.155.gate_proj.weight', 'model.layers.92.mlp.experts.155.up_proj.weight', 'model.layers.92.mlp.experts.156.down_proj.weight', 'model.layers.92.mlp.experts.156.gate_proj.weight', 'model.layers.92.mlp.experts.156.up_proj.weight', 'model.layers.92.mlp.experts.157.down_proj.weight', 'model.layers.92.mlp.experts.157.gate_proj.weight', 'model.layers.92.mlp.experts.157.up_proj.weight', 'model.layers.92.mlp.experts.158.down_proj.weight', 'model.layers.92.mlp.experts.158.gate_proj.weight', 'model.layers.92.mlp.experts.158.up_proj.weight', 'model.layers.92.mlp.experts.159.down_proj.weight', 'model.layers.92.mlp.experts.159.gate_proj.weight', 'model.layers.92.mlp.experts.159.up_proj.weight', 'model.layers.92.mlp.experts.16.down_proj.weight', 'model.layers.92.mlp.experts.16.gate_proj.weight', 'model.layers.92.mlp.experts.16.up_proj.weight', 'model.layers.92.mlp.experts.17.down_proj.weight', 'model.layers.92.mlp.experts.17.gate_proj.weight', 'model.layers.92.mlp.experts.17.up_proj.weight', 'model.layers.92.mlp.experts.18.down_proj.weight', 'model.layers.92.mlp.experts.18.gate_proj.weight', 'model.layers.92.mlp.experts.18.up_proj.weight', 'model.layers.92.mlp.experts.19.down_proj.weight', 'model.layers.92.mlp.experts.19.gate_proj.weight', 'model.layers.92.mlp.experts.19.up_proj.weight', 'model.layers.92.mlp.experts.2.down_proj.weight', 'model.layers.92.mlp.experts.2.gate_proj.weight', 'model.layers.92.mlp.experts.2.up_proj.weight', 'model.layers.92.mlp.experts.20.down_proj.weight', 'model.layers.92.mlp.experts.20.gate_proj.weight', 'model.layers.92.mlp.experts.20.up_proj.weight', 'model.layers.92.mlp.experts.21.down_proj.weight', 'model.layers.92.mlp.experts.21.gate_proj.weight', 'model.layers.92.mlp.experts.21.up_proj.weight', 'model.layers.92.mlp.experts.22.down_proj.weight', 'model.layers.92.mlp.experts.22.gate_proj.weight', 'model.layers.92.mlp.experts.22.up_proj.weight', 'model.layers.92.mlp.experts.23.down_proj.weight', 'model.layers.92.mlp.experts.23.gate_proj.weight', 'model.layers.92.mlp.experts.23.up_proj.weight', 'model.layers.92.mlp.experts.24.down_proj.weight', 'model.layers.92.mlp.experts.24.gate_proj.weight', 'model.layers.92.mlp.experts.24.up_proj.weight', 'model.layers.92.mlp.experts.25.down_proj.weight', 'model.layers.92.mlp.experts.25.gate_proj.weight', 'model.layers.92.mlp.experts.25.up_proj.weight', 'model.layers.92.mlp.experts.26.down_proj.weight', 'model.layers.92.mlp.experts.26.gate_proj.weight', 'model.layers.92.mlp.experts.26.up_proj.weight', 'model.layers.92.mlp.experts.27.down_proj.weight', 'model.layers.92.mlp.experts.27.gate_proj.weight', 'model.layers.92.mlp.experts.27.up_proj.weight', 'model.layers.92.mlp.experts.28.down_proj.weight', 'model.layers.92.mlp.experts.28.gate_proj.weight', 'model.layers.92.mlp.experts.28.up_proj.weight', 'model.layers.92.mlp.experts.29.down_proj.weight', 'model.layers.92.mlp.experts.29.gate_proj.weight', 'model.layers.92.mlp.experts.29.up_proj.weight', 'model.layers.92.mlp.experts.3.down_proj.weight', 'model.layers.92.mlp.experts.3.gate_proj.weight', 'model.layers.92.mlp.experts.3.up_proj.weight', 'model.layers.92.mlp.experts.30.down_proj.weight', 'model.layers.92.mlp.experts.30.gate_proj.weight', 'model.layers.92.mlp.experts.30.up_proj.weight', 'model.layers.92.mlp.experts.31.down_proj.weight', 'model.layers.92.mlp.experts.31.gate_proj.weight', 'model.layers.92.mlp.experts.31.up_proj.weight', 'model.layers.92.mlp.experts.32.down_proj.weight', 'model.layers.92.mlp.experts.32.gate_proj.weight', 'model.layers.92.mlp.experts.32.up_proj.weight', 'model.layers.92.mlp.experts.33.down_proj.weight', 'model.layers.92.mlp.experts.33.gate_proj.weight', 'model.layers.92.mlp.experts.33.up_proj.weight', 'model.layers.92.mlp.experts.34.down_proj.weight', 'model.layers.92.mlp.experts.34.gate_proj.weight', 'model.layers.92.mlp.experts.34.up_proj.weight', 'model.layers.92.mlp.experts.35.down_proj.weight', 'model.layers.92.mlp.experts.35.gate_proj.weight', 'model.layers.92.mlp.experts.35.up_proj.weight', 'model.layers.92.mlp.experts.36.down_proj.weight', 'model.layers.92.mlp.experts.36.gate_proj.weight', 'model.layers.92.mlp.experts.36.up_proj.weight', 'model.layers.92.mlp.experts.37.down_proj.weight', 'model.layers.92.mlp.experts.37.gate_proj.weight', 'model.layers.92.mlp.experts.37.up_proj.weight', 'model.layers.92.mlp.experts.38.down_proj.weight', 'model.layers.92.mlp.experts.38.gate_proj.weight', 'model.layers.92.mlp.experts.38.up_proj.weight', 'model.layers.92.mlp.experts.39.down_proj.weight', 'model.layers.92.mlp.experts.39.gate_proj.weight', 'model.layers.92.mlp.experts.39.up_proj.weight', 'model.layers.92.mlp.experts.4.down_proj.weight', 'model.layers.92.mlp.experts.4.gate_proj.weight', 'model.layers.92.mlp.experts.4.up_proj.weight', 'model.layers.92.mlp.experts.40.down_proj.weight', 'model.layers.92.mlp.experts.40.gate_proj.weight', 'model.layers.92.mlp.experts.40.up_proj.weight', 'model.layers.92.mlp.experts.41.down_proj.weight', 'model.layers.92.mlp.experts.41.gate_proj.weight', 'model.layers.92.mlp.experts.41.up_proj.weight', 'model.layers.92.mlp.experts.42.down_proj.weight', 'model.layers.92.mlp.experts.42.gate_proj.weight', 'model.layers.92.mlp.experts.42.up_proj.weight', 'model.layers.92.mlp.experts.43.down_proj.weight', 'model.layers.92.mlp.experts.43.gate_proj.weight', 'model.layers.92.mlp.experts.43.up_proj.weight', 'model.layers.92.mlp.experts.44.down_proj.weight', 'model.layers.92.mlp.experts.44.gate_proj.weight', 'model.layers.92.mlp.experts.44.up_proj.weight', 'model.layers.92.mlp.experts.45.down_proj.weight', 'model.layers.92.mlp.experts.45.gate_proj.weight', 'model.layers.92.mlp.experts.45.up_proj.weight', 'model.layers.92.mlp.experts.46.down_proj.weight', 'model.layers.92.mlp.experts.46.gate_proj.weight', 'model.layers.92.mlp.experts.46.up_proj.weight', 'model.layers.92.mlp.experts.47.down_proj.weight', 'model.layers.92.mlp.experts.47.gate_proj.weight', 'model.layers.92.mlp.experts.47.up_proj.weight', 'model.layers.92.mlp.experts.48.down_proj.weight', 'model.layers.92.mlp.experts.48.gate_proj.weight', 'model.layers.92.mlp.experts.48.up_proj.weight', 'model.layers.92.mlp.experts.49.down_proj.weight', 'model.layers.92.mlp.experts.49.gate_proj.weight', 'model.layers.92.mlp.experts.49.up_proj.weight', 'model.layers.92.mlp.experts.5.down_proj.weight', 'model.layers.92.mlp.experts.5.gate_proj.weight', 'model.layers.92.mlp.experts.5.up_proj.weight', 'model.layers.92.mlp.experts.50.down_proj.weight', 'model.layers.92.mlp.experts.50.gate_proj.weight', 'model.layers.92.mlp.experts.50.up_proj.weight', 'model.layers.92.mlp.experts.51.down_proj.weight', 'model.layers.92.mlp.experts.51.gate_proj.weight', 'model.layers.92.mlp.experts.51.up_proj.weight', 'model.layers.92.mlp.experts.52.down_proj.weight', 'model.layers.92.mlp.experts.52.gate_proj.weight', 'model.layers.92.mlp.experts.52.up_proj.weight', 'model.layers.92.mlp.experts.53.down_proj.weight', 'model.layers.92.mlp.experts.53.gate_proj.weight', 'model.layers.92.mlp.experts.53.up_proj.weight', 'model.layers.92.mlp.experts.54.down_proj.weight', 'model.layers.92.mlp.experts.54.gate_proj.weight', 'model.layers.92.mlp.experts.54.up_proj.weight', 'model.layers.92.mlp.experts.55.down_proj.weight', 'model.layers.92.mlp.experts.55.gate_proj.weight', 'model.layers.92.mlp.experts.55.up_proj.weight', 'model.layers.92.mlp.experts.56.down_proj.weight', 'model.layers.92.mlp.experts.56.gate_proj.weight', 'model.layers.92.mlp.experts.56.up_proj.weight', 'model.layers.92.mlp.experts.57.down_proj.weight', 'model.layers.92.mlp.experts.57.gate_proj.weight', 'model.layers.92.mlp.experts.57.up_proj.weight', 'model.layers.92.mlp.experts.58.down_proj.weight', 'model.layers.92.mlp.experts.58.gate_proj.weight', 'model.layers.92.mlp.experts.58.up_proj.weight', 'model.layers.92.mlp.experts.59.down_proj.weight', 'model.layers.92.mlp.experts.59.gate_proj.weight', 'model.layers.92.mlp.experts.59.up_proj.weight', 'model.layers.92.mlp.experts.6.down_proj.weight', 'model.layers.92.mlp.experts.6.gate_proj.weight', 'model.layers.92.mlp.experts.6.up_proj.weight', 'model.layers.92.mlp.experts.60.down_proj.weight', 'model.layers.92.mlp.experts.60.gate_proj.weight', 'model.layers.92.mlp.experts.60.up_proj.weight', 'model.layers.92.mlp.experts.61.down_proj.weight', 'model.layers.92.mlp.experts.61.gate_proj.weight', 'model.layers.92.mlp.experts.61.up_proj.weight', 'model.layers.92.mlp.experts.62.down_proj.weight', 'model.layers.92.mlp.experts.62.gate_proj.weight', 'model.layers.92.mlp.experts.62.up_proj.weight', 'model.layers.92.mlp.experts.63.down_proj.weight', 'model.layers.92.mlp.experts.63.gate_proj.weight', 'model.layers.92.mlp.experts.63.up_proj.weight', 'model.layers.92.mlp.experts.64.down_proj.weight', 'model.layers.92.mlp.experts.64.gate_proj.weight', 'model.layers.92.mlp.experts.64.up_proj.weight', 'model.layers.92.mlp.experts.65.down_proj.weight', 'model.layers.92.mlp.experts.65.gate_proj.weight', 'model.layers.92.mlp.experts.65.up_proj.weight', 'model.layers.92.mlp.experts.66.down_proj.weight', 'model.layers.92.mlp.experts.66.gate_proj.weight', 'model.layers.92.mlp.experts.66.up_proj.weight', 'model.layers.92.mlp.experts.67.down_proj.weight', 'model.layers.92.mlp.experts.67.gate_proj.weight', 'model.layers.92.mlp.experts.67.up_proj.weight', 'model.layers.92.mlp.experts.68.down_proj.weight', 'model.layers.92.mlp.experts.68.gate_proj.weight', 'model.layers.92.mlp.experts.68.up_proj.weight', 'model.layers.92.mlp.experts.69.down_proj.weight', 'model.layers.92.mlp.experts.69.gate_proj.weight', 'model.layers.92.mlp.experts.69.up_proj.weight', 'model.layers.92.mlp.experts.7.down_proj.weight', 'model.layers.92.mlp.experts.7.gate_proj.weight', 'model.layers.92.mlp.experts.7.up_proj.weight', 'model.layers.92.mlp.experts.70.down_proj.weight', 'model.layers.92.mlp.experts.70.gate_proj.weight', 'model.layers.92.mlp.experts.70.up_proj.weight', 'model.layers.92.mlp.experts.71.down_proj.weight', 'model.layers.92.mlp.experts.71.gate_proj.weight', 'model.layers.92.mlp.experts.71.up_proj.weight', 'model.layers.92.mlp.experts.72.down_proj.weight', 'model.layers.92.mlp.experts.72.gate_proj.weight', 'model.layers.92.mlp.experts.72.up_proj.weight', 'model.layers.92.mlp.experts.73.down_proj.weight', 'model.layers.92.mlp.experts.73.gate_proj.weight', 'model.layers.92.mlp.experts.73.up_proj.weight', 'model.layers.92.mlp.experts.74.down_proj.weight', 'model.layers.92.mlp.experts.74.gate_proj.weight', 'model.layers.92.mlp.experts.74.up_proj.weight', 'model.layers.92.mlp.experts.75.down_proj.weight', 'model.layers.92.mlp.experts.75.gate_proj.weight', 'model.layers.92.mlp.experts.75.up_proj.weight', 'model.layers.92.mlp.experts.76.down_proj.weight', 'model.layers.92.mlp.experts.76.gate_proj.weight', 'model.layers.92.mlp.experts.76.up_proj.weight', 'model.layers.92.mlp.experts.77.down_proj.weight', 'model.layers.92.mlp.experts.77.gate_proj.weight', 'model.layers.92.mlp.experts.77.up_proj.weight', 'model.layers.92.mlp.experts.78.down_proj.weight', 'model.layers.92.mlp.experts.78.gate_proj.weight', 'model.layers.92.mlp.experts.78.up_proj.weight', 'model.layers.92.mlp.experts.79.down_proj.weight', 'model.layers.92.mlp.experts.79.gate_proj.weight', 'model.layers.92.mlp.experts.79.up_proj.weight', 'model.layers.92.mlp.experts.8.down_proj.weight', 'model.layers.92.mlp.experts.8.gate_proj.weight', 'model.layers.92.mlp.experts.8.up_proj.weight', 'model.layers.92.mlp.experts.80.down_proj.weight', 'model.layers.92.mlp.experts.80.gate_proj.weight', 'model.layers.92.mlp.experts.80.up_proj.weight', 'model.layers.92.mlp.experts.81.down_proj.weight', 'model.layers.92.mlp.experts.81.gate_proj.weight', 'model.layers.92.mlp.experts.81.up_proj.weight', 'model.layers.92.mlp.experts.82.down_proj.weight', 'model.layers.92.mlp.experts.82.gate_proj.weight', 'model.layers.92.mlp.experts.82.up_proj.weight', 'model.layers.92.mlp.experts.83.down_proj.weight', 'model.layers.92.mlp.experts.83.gate_proj.weight', 'model.layers.92.mlp.experts.83.up_proj.weight', 'model.layers.92.mlp.experts.84.down_proj.weight', 'model.layers.92.mlp.experts.84.gate_proj.weight', 'model.layers.92.mlp.experts.84.up_proj.weight', 'model.layers.92.mlp.experts.85.down_proj.weight', 'model.layers.92.mlp.experts.85.gate_proj.weight', 'model.layers.92.mlp.experts.85.up_proj.weight', 'model.layers.92.mlp.experts.86.down_proj.weight', 'model.layers.92.mlp.experts.86.gate_proj.weight', 'model.layers.92.mlp.experts.86.up_proj.weight', 'model.layers.92.mlp.experts.87.down_proj.weight', 'model.layers.92.mlp.experts.87.gate_proj.weight', 'model.layers.92.mlp.experts.87.up_proj.weight', 'model.layers.92.mlp.experts.88.down_proj.weight', 'model.layers.92.mlp.experts.88.gate_proj.weight', 'model.layers.92.mlp.experts.88.up_proj.weight', 'model.layers.92.mlp.experts.89.down_proj.weight', 'model.layers.92.mlp.experts.89.gate_proj.weight', 'model.layers.92.mlp.experts.89.up_proj.weight', 'model.layers.92.mlp.experts.9.down_proj.weight', 'model.layers.92.mlp.experts.9.gate_proj.weight', 'model.layers.92.mlp.experts.9.up_proj.weight', 'model.layers.92.mlp.experts.90.down_proj.weight', 'model.layers.92.mlp.experts.90.gate_proj.weight', 'model.layers.92.mlp.experts.90.up_proj.weight', 'model.layers.92.mlp.experts.91.down_proj.weight', 'model.layers.92.mlp.experts.91.gate_proj.weight', 'model.layers.92.mlp.experts.91.up_proj.weight', 'model.layers.92.mlp.experts.92.down_proj.weight', 'model.layers.92.mlp.experts.92.gate_proj.weight', 'model.layers.92.mlp.experts.92.up_proj.weight', 'model.layers.92.mlp.experts.93.down_proj.weight', 'model.layers.92.mlp.experts.93.gate_proj.weight', 'model.layers.92.mlp.experts.93.up_proj.weight', 'model.layers.92.mlp.experts.94.down_proj.weight', 'model.layers.92.mlp.experts.94.gate_proj.weight', 'model.layers.92.mlp.experts.94.up_proj.weight', 'model.layers.92.mlp.experts.95.down_proj.weight', 'model.layers.92.mlp.experts.95.gate_proj.weight', 'model.layers.92.mlp.experts.95.up_proj.weight', 'model.layers.92.mlp.experts.96.down_proj.weight', 'model.layers.92.mlp.experts.96.gate_proj.weight', 'model.layers.92.mlp.experts.96.up_proj.weight', 'model.layers.92.mlp.experts.97.down_proj.weight', 'model.layers.92.mlp.experts.97.gate_proj.weight', 'model.layers.92.mlp.experts.97.up_proj.weight', 'model.layers.92.mlp.experts.98.down_proj.weight', 'model.layers.92.mlp.experts.98.gate_proj.weight', 'model.layers.92.mlp.experts.98.up_proj.weight', 'model.layers.92.mlp.experts.99.down_proj.weight', 'model.layers.92.mlp.experts.99.gate_proj.weight', 'model.layers.92.mlp.experts.99.up_proj.weight', 'model.layers.92.mlp.gate.e_score_correction_bias', 'model.layers.92.mlp.gate.weight', 'model.layers.92.mlp.shared_experts.down_proj.weight', 'model.layers.92.mlp.shared_experts.gate_proj.weight', 'model.layers.92.mlp.shared_experts.up_proj.weight', 'model.layers.92.post_attention_layernorm.weight', 'model.layers.92.self_attn.k_norm.weight', 'model.layers.92.self_attn.k_proj.bias', 'model.layers.92.self_attn.k_proj.weight', 'model.layers.92.self_attn.o_proj.weight', 'model.layers.92.self_attn.q_norm.weight', 'model.layers.92.self_attn.q_proj.bias', 'model.layers.92.self_attn.q_proj.weight', 'model.layers.92.self_attn.v_proj.bias', 'model.layers.92.self_attn.v_proj.weight', 'model.layers.92.shared_head.norm.weight']
- This IS expected if you are initializing Glm4MoeForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Glm4MoeForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
The module name  (originally ) is not a valid Python identifier. Please rename the original module to avoid import issues.
๐Ÿ“ฅ Loading model...
Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 93/93 [01:01<00:00,  1.52it/s]
Some weights of the model checkpoint at /media/data/models/GLM-4.6/ were not used when initializing Glm4MoeForCausalLM: ['model.layers.92.eh_proj.weight', 'model.layers.92.enorm.weight', 'model.layers.92.hnorm.weight', 'model.layers.92.input_layernorm.weight', 'model.layers.92.mlp.experts.0.down_proj.weight', 'model.layers.92.mlp.experts.0.gate_proj.weight', 'model.layers.92.mlp.experts.0.up_proj.weight', 'model.layers.92.mlp.experts.1.down_proj.weight', 'model.layers.92.mlp.experts.1.gate_proj.weight', 'model.layers.92.mlp.experts.1.up_proj.weight', 'model.layers.92.mlp.experts.10.down_proj.weight', 'model.layers.92.mlp.experts.10.gate_proj.weight', 'model.layers.92.mlp.experts.10.up_proj.weight', 'model.layers.92.mlp.experts.100.down_proj.weight', 'model.layers.92.mlp.experts.100.gate_proj.weight', 'model.layers.92.mlp.experts.100.up_proj.weight', 'model.layers.92.mlp.experts.101.down_proj.weight', 'model.layers.92.mlp.experts.101.gate_proj.weight', 'model.layers.92.mlp.experts.101.up_proj.weight', 'model.layers.92.mlp.experts.102.down_proj.weight', 'model.layers.92.mlp.experts.102.gate_proj.weight', 'model.layers.92.mlp.experts.102.up_proj.weight', 'model.layers.92.mlp.experts.103.down_proj.weight', 'model.layers.92.mlp.experts.103.gate_proj.weight', 'model.layers.92.mlp.experts.103.up_proj.weight', 'model.layers.92.mlp.experts.104.down_proj.weight', 'model.layers.92.mlp.experts.104.gate_proj.weight', 'model.layers.92.mlp.experts.104.up_proj.weight', 'model.layers.92.mlp.experts.105.down_proj.weight', 'model.layers.92.mlp.experts.105.gate_proj.weight', 'model.layers.92.mlp.experts.105.up_proj.weight', 'model.layers.92.mlp.experts.106.down_proj.weight', 'model.layers.92.mlp.experts.106.gate_proj.weight', 'model.layers.92.mlp.experts.106.up_proj.weight', 'model.layers.92.mlp.experts.107.down_proj.weight', 'model.layers.92.mlp.experts.107.gate_proj.weight', 'model.layers.92.mlp.experts.107.up_proj.weight', 'model.layers.92.mlp.experts.108.down_proj.weight', 'model.layers.92.mlp.experts.108.gate_proj.weight', 'model.layers.92.mlp.experts.108.up_proj.weight', 'model.layers.92.mlp.experts.109.down_proj.weight', 'model.layers.92.mlp.experts.109.gate_proj.weight', 'model.layers.92.mlp.experts.109.up_proj.weight', 'model.layers.92.mlp.experts.11.down_proj.weight', 'model.layers.92.mlp.experts.11.gate_proj.weight', 'model.layers.92.mlp.experts.11.up_proj.weight', 'model.layers.92.mlp.experts.110.down_proj.weight', 'model.layers.92.mlp.experts.110.gate_proj.weight', 'model.layers.92.mlp.experts.110.up_proj.weight', 'model.layers.92.mlp.experts.111.down_proj.weight', 'model.layers.92.mlp.experts.111.gate_proj.weight', 'model.layers.92.mlp.experts.111.up_proj.weight', 'model.layers.92.mlp.experts.112.down_proj.weight', 'model.layers.92.mlp.experts.112.gate_proj.weight', 'model.layers.92.mlp.experts.112.up_proj.weight', 'model.layers.92.mlp.experts.113.down_proj.weight', 'model.layers.92.mlp.experts.113.gate_proj.weight', 'model.layers.92.mlp.experts.113.up_proj.weight', 'model.layers.92.mlp.experts.114.down_proj.weight', 'model.layers.92.mlp.experts.114.gate_proj.weight', 'model.layers.92.mlp.experts.114.up_proj.weight', 'model.layers.92.mlp.experts.115.down_proj.weight', 'model.layers.92.mlp.experts.115.gate_proj.weight', 'model.layers.92.mlp.experts.115.up_proj.weight', 'model.layers.92.mlp.experts.116.down_proj.weight', 'model.layers.92.mlp.experts.116.gate_proj.weight', 'model.layers.92.mlp.experts.116.up_proj.weight', 'model.layers.92.mlp.experts.117.down_proj.weight', 'model.layers.92.mlp.experts.117.gate_proj.weight', 'model.layers.92.mlp.experts.117.up_proj.weight', 'model.layers.92.mlp.experts.118.down_proj.weight', 'model.layers.92.mlp.experts.118.gate_proj.weight', 'model.layers.92.mlp.experts.118.up_proj.weight', 'model.layers.92.mlp.experts.119.down_proj.weight', 'model.layers.92.mlp.experts.119.gate_proj.weight', 'model.layers.92.mlp.experts.119.up_proj.weight', 'model.layers.92.mlp.experts.12.down_proj.weight', 'model.layers.92.mlp.experts.12.gate_proj.weight', 'model.layers.92.mlp.experts.12.up_proj.weight', 'model.layers.92.mlp.experts.120.down_proj.weight', 'model.layers.92.mlp.experts.120.gate_proj.weight', 'model.layers.92.mlp.experts.120.up_proj.weight', 'model.layers.92.mlp.experts.121.down_proj.weight', 'model.layers.92.mlp.experts.121.gate_proj.weight', 'model.layers.92.mlp.experts.121.up_proj.weight', 'model.layers.92.mlp.experts.122.down_proj.weight', 'model.layers.92.mlp.experts.122.gate_proj.weight', 'model.layers.92.mlp.experts.122.up_proj.weight', 'model.layers.92.mlp.experts.123.down_proj.weight', 'model.layers.92.mlp.experts.123.gate_proj.weight', 'model.layers.92.mlp.experts.123.up_proj.weight', 'model.layers.92.mlp.experts.124.down_proj.weight', 'model.layers.92.mlp.experts.124.gate_proj.weight', 'model.layers.92.mlp.experts.124.up_proj.weight', 'model.layers.92.mlp.experts.125.down_proj.weight', 'model.layers.92.mlp.experts.125.gate_proj.weight', 'model.layers.92.mlp.experts.125.up_proj.weight', 'model.layers.92.mlp.experts.126.down_proj.weight', 'model.layers.92.mlp.experts.126.gate_proj.weight', 'model.layers.92.mlp.experts.126.up_proj.weight', 'model.layers.92.mlp.experts.127.down_proj.weight', 'model.layers.92.mlp.experts.127.gate_proj.weight', 'model.layers.92.mlp.experts.127.up_proj.weight', 'model.layers.92.mlp.experts.128.down_proj.weight', 'model.layers.92.mlp.experts.128.gate_proj.weight', 'model.layers.92.mlp.experts.128.up_proj.weight', 'model.layers.92.mlp.experts.129.down_proj.weight', 'model.layers.92.mlp.experts.129.gate_proj.weight', 'model.layers.92.mlp.experts.129.up_proj.weight', 'model.layers.92.mlp.experts.13.down_proj.weight', 'model.layers.92.mlp.experts.13.gate_proj.weight', 'model.layers.92.mlp.experts.13.up_proj.weight', 'model.layers.92.mlp.experts.130.down_proj.weight', 'model.layers.92.mlp.experts.130.gate_proj.weight', 'model.layers.92.mlp.experts.130.up_proj.weight', 'model.layers.92.mlp.experts.131.down_proj.weight', 'model.layers.92.mlp.experts.131.gate_proj.weight', 'model.layers.92.mlp.experts.131.up_proj.weight', 'model.layers.92.mlp.experts.132.down_proj.weight', 'model.layers.92.mlp.experts.132.gate_proj.weight', 'model.layers.92.mlp.experts.132.up_proj.weight', 'model.layers.92.mlp.experts.133.down_proj.weight', 'model.layers.92.mlp.experts.133.gate_proj.weight', 'model.layers.92.mlp.experts.133.up_proj.weight', 'model.layers.92.mlp.experts.134.down_proj.weight', 'model.layers.92.mlp.experts.134.gate_proj.weight', 'model.layers.92.mlp.experts.134.up_proj.weight', 'model.layers.92.mlp.experts.135.down_proj.weight', 'model.layers.92.mlp.experts.135.gate_proj.weight', 'model.layers.92.mlp.experts.135.up_proj.weight', 'model.layers.92.mlp.experts.136.down_proj.weight', 'model.layers.92.mlp.experts.136.gate_proj.weight', 'model.layers.92.mlp.experts.136.up_proj.weight', 'model.layers.92.mlp.experts.137.down_proj.weight', 'model.layers.92.mlp.experts.137.gate_proj.weight', 'model.layers.92.mlp.experts.137.up_proj.weight', 'model.layers.92.mlp.experts.138.down_proj.weight', 'model.layers.92.mlp.experts.138.gate_proj.weight', 'model.layers.92.mlp.experts.138.up_proj.weight', 'model.layers.92.mlp.experts.139.down_proj.weight', 'model.layers.92.mlp.experts.139.gate_proj.weight', 'model.layers.92.mlp.experts.139.up_proj.weight', 'model.layers.92.mlp.experts.14.down_proj.weight', 'model.layers.92.mlp.experts.14.gate_proj.weight', 'model.layers.92.mlp.experts.14.up_proj.weight', 'model.layers.92.mlp.experts.140.down_proj.weight', 'model.layers.92.mlp.experts.140.gate_proj.weight', 'model.layers.92.mlp.experts.140.up_proj.weight', 'model.layers.92.mlp.experts.141.down_proj.weight', 'model.layers.92.mlp.experts.141.gate_proj.weight', 'model.layers.92.mlp.experts.141.up_proj.weight', 'model.layers.92.mlp.experts.142.down_proj.weight', 'model.layers.92.mlp.experts.142.gate_proj.weight', 'model.layers.92.mlp.experts.142.up_proj.weight', 'model.layers.92.mlp.experts.143.down_proj.weight', 'model.layers.92.mlp.experts.143.gate_proj.weight', 'model.layers.92.mlp.experts.143.up_proj.weight', 'model.layers.92.mlp.experts.144.down_proj.weight', 'model.layers.92.mlp.experts.144.gate_proj.weight', 'model.layers.92.mlp.experts.144.up_proj.weight', 'model.layers.92.mlp.experts.145.down_proj.weight', 'model.layers.92.mlp.experts.145.gate_proj.weight', 'model.layers.92.mlp.experts.145.up_proj.weight', 'model.layers.92.mlp.experts.146.down_proj.weight', 'model.layers.92.mlp.experts.146.gate_proj.weight', 'model.layers.92.mlp.experts.146.up_proj.weight', 'model.layers.92.mlp.experts.147.down_proj.weight', 'model.layers.92.mlp.experts.147.gate_proj.weight', 'model.layers.92.mlp.experts.147.up_proj.weight', 'model.layers.92.mlp.experts.148.down_proj.weight', 'model.layers.92.mlp.experts.148.gate_proj.weight', 'model.layers.92.mlp.experts.148.up_proj.weight', 'model.layers.92.mlp.experts.149.down_proj.weight', 'model.layers.92.mlp.experts.149.gate_proj.weight', 'model.layers.92.mlp.experts.149.up_proj.weight', 'model.layers.92.mlp.experts.15.down_proj.weight', 'model.layers.92.mlp.experts.15.gate_proj.weight', 'model.layers.92.mlp.experts.15.up_proj.weight', 'model.layers.92.mlp.experts.150.down_proj.weight', 'model.layers.92.mlp.experts.150.gate_proj.weight', 'model.layers.92.mlp.experts.150.up_proj.weight', 'model.layers.92.mlp.experts.151.down_proj.weight', 'model.layers.92.mlp.experts.151.gate_proj.weight', 'model.layers.92.mlp.experts.151.up_proj.weight', 'model.layers.92.mlp.experts.152.down_proj.weight', 'model.layers.92.mlp.experts.152.gate_proj.weight', 'model.layers.92.mlp.experts.152.up_proj.weight', 'model.layers.92.mlp.experts.153.down_proj.weight', 'model.layers.92.mlp.experts.153.gate_proj.weight', 'model.layers.92.mlp.experts.153.up_proj.weight', 'model.layers.92.mlp.experts.154.down_proj.weight', 'model.layers.92.mlp.experts.154.gate_proj.weight', 'model.layers.92.mlp.experts.154.up_proj.weight', 'model.layers.92.mlp.experts.155.down_proj.weight', 'model.layers.92.mlp.experts.155.gate_proj.weight', 'model.layers.92.mlp.experts.155.up_proj.weight', 'model.layers.92.mlp.experts.156.down_proj.weight', 'model.layers.92.mlp.experts.156.gate_proj.weight', 'model.layers.92.mlp.experts.156.up_proj.weight', 'model.layers.92.mlp.experts.157.down_proj.weight', 'model.layers.92.mlp.experts.157.gate_proj.weight', 'model.layers.92.mlp.experts.157.up_proj.weight', 'model.layers.92.mlp.experts.158.down_proj.weight', 'model.layers.92.mlp.experts.158.gate_proj.weight', 'model.layers.92.mlp.experts.158.up_proj.weight', 'model.layers.92.mlp.experts.159.down_proj.weight', 'model.layers.92.mlp.experts.159.gate_proj.weight', 'model.layers.92.mlp.experts.159.up_proj.weight', 'model.layers.92.mlp.experts.16.down_proj.weight', 'model.layers.92.mlp.experts.16.gate_proj.weight', 'model.layers.92.mlp.experts.16.up_proj.weight', 'model.layers.92.mlp.experts.17.down_proj.weight', 'model.layers.92.mlp.experts.17.gate_proj.weight', 'model.layers.92.mlp.experts.17.up_proj.weight', 'model.layers.92.mlp.experts.18.down_proj.weight', 'model.layers.92.mlp.experts.18.gate_proj.weight', 'model.layers.92.mlp.experts.18.up_proj.weight', 'model.layers.92.mlp.experts.19.down_proj.weight', 'model.layers.92.mlp.experts.19.gate_proj.weight', 'model.layers.92.mlp.experts.19.up_proj.weight', 'model.layers.92.mlp.experts.2.down_proj.weight', 'model.layers.92.mlp.experts.2.gate_proj.weight', 'model.layers.92.mlp.experts.2.up_proj.weight', 'model.layers.92.mlp.experts.20.down_proj.weight', 'model.layers.92.mlp.experts.20.gate_proj.weight', 'model.layers.92.mlp.experts.20.up_proj.weight', 'model.layers.92.mlp.experts.21.down_proj.weight', 'model.layers.92.mlp.experts.21.gate_proj.weight', 'model.layers.92.mlp.experts.21.up_proj.weight', 'model.layers.92.mlp.experts.22.down_proj.weight', 'model.layers.92.mlp.experts.22.gate_proj.weight', 'model.layers.92.mlp.experts.22.up_proj.weight', 'model.layers.92.mlp.experts.23.down_proj.weight', 'model.layers.92.mlp.experts.23.gate_proj.weight', 'model.layers.92.mlp.experts.23.up_proj.weight', 'model.layers.92.mlp.experts.24.down_proj.weight', 'model.layers.92.mlp.experts.24.gate_proj.weight', 'model.layers.92.mlp.experts.24.up_proj.weight', 'model.layers.92.mlp.experts.25.down_proj.weight', 'model.layers.92.mlp.experts.25.gate_proj.weight', 'model.layers.92.mlp.experts.25.up_proj.weight', 'model.layers.92.mlp.experts.26.down_proj.weight', 'model.layers.92.mlp.experts.26.gate_proj.weight', 'model.layers.92.mlp.experts.26.up_proj.weight', 'model.layers.92.mlp.experts.27.down_proj.weight', 'model.layers.92.mlp.experts.27.gate_proj.weight', 'model.layers.92.mlp.experts.27.up_proj.weight', 'model.layers.92.mlp.experts.28.down_proj.weight', 'model.layers.92.mlp.experts.28.gate_proj.weight', 'model.layers.92.mlp.experts.28.up_proj.weight', 'model.layers.92.mlp.experts.29.down_proj.weight', 'model.layers.92.mlp.experts.29.gate_proj.weight', 'model.layers.92.mlp.experts.29.up_proj.weight', 'model.layers.92.mlp.experts.3.down_proj.weight', 'model.layers.92.mlp.experts.3.gate_proj.weight', 'model.layers.92.mlp.experts.3.up_proj.weight', 'model.layers.92.mlp.experts.30.down_proj.weight', 'model.layers.92.mlp.experts.30.gate_proj.weight', 'model.layers.92.mlp.experts.30.up_proj.weight', 'model.layers.92.mlp.experts.31.down_proj.weight', 'model.layers.92.mlp.experts.31.gate_proj.weight', 'model.layers.92.mlp.experts.31.up_proj.weight', 'model.layers.92.mlp.experts.32.down_proj.weight', 'model.layers.92.mlp.experts.32.gate_proj.weight', 'model.layers.92.mlp.experts.32.up_proj.weight', 'model.layers.92.mlp.experts.33.down_proj.weight', 'model.layers.92.mlp.experts.33.gate_proj.weight', 'model.layers.92.mlp.experts.33.up_proj.weight', 'model.layers.92.mlp.experts.34.down_proj.weight', 'model.layers.92.mlp.experts.34.gate_proj.weight', 'model.layers.92.mlp.experts.34.up_proj.weight', 'model.layers.92.mlp.experts.35.down_proj.weight', 'model.layers.92.mlp.experts.35.gate_proj.weight', 'model.layers.92.mlp.experts.35.up_proj.weight', 'model.layers.92.mlp.experts.36.down_proj.weight', 'model.layers.92.mlp.experts.36.gate_proj.weight', 'model.layers.92.mlp.experts.36.up_proj.weight', 'model.layers.92.mlp.experts.37.down_proj.weight', 'model.layers.92.mlp.experts.37.gate_proj.weight', 'model.layers.92.mlp.experts.37.up_proj.weight', 'model.layers.92.mlp.experts.38.down_proj.weight', 'model.layers.92.mlp.experts.38.gate_proj.weight', 'model.layers.92.mlp.experts.38.up_proj.weight', 'model.layers.92.mlp.experts.39.down_proj.weight', 'model.layers.92.mlp.experts.39.gate_proj.weight', 'model.layers.92.mlp.experts.39.up_proj.weight', 'model.layers.92.mlp.experts.4.down_proj.weight', 'model.layers.92.mlp.experts.4.gate_proj.weight', 'model.layers.92.mlp.experts.4.up_proj.weight', 'model.layers.92.mlp.experts.40.down_proj.weight', 'model.layers.92.mlp.experts.40.gate_proj.weight', 'model.layers.92.mlp.experts.40.up_proj.weight', 'model.layers.92.mlp.experts.41.down_proj.weight', 'model.layers.92.mlp.experts.41.gate_proj.weight', 'model.layers.92.mlp.experts.41.up_proj.weight', 'model.layers.92.mlp.experts.42.down_proj.weight', 'model.layers.92.mlp.experts.42.gate_proj.weight', 'model.layers.92.mlp.experts.42.up_proj.weight', 'model.layers.92.mlp.experts.43.down_proj.weight', 'model.layers.92.mlp.experts.43.gate_proj.weight', 'model.layers.92.mlp.experts.43.up_proj.weight', 'model.layers.92.mlp.experts.44.down_proj.weight', 'model.layers.92.mlp.experts.44.gate_proj.weight', 'model.layers.92.mlp.experts.44.up_proj.weight', 'model.layers.92.mlp.experts.45.down_proj.weight', 'model.layers.92.mlp.experts.45.gate_proj.weight', 'model.layers.92.mlp.experts.45.up_proj.weight', 'model.layers.92.mlp.experts.46.down_proj.weight', 'model.layers.92.mlp.experts.46.gate_proj.weight', 'model.layers.92.mlp.experts.46.up_proj.weight', 'model.layers.92.mlp.experts.47.down_proj.weight', 'model.layers.92.mlp.experts.47.gate_proj.weight', 'model.layers.92.mlp.experts.47.up_proj.weight', 'model.layers.92.mlp.experts.48.down_proj.weight', 'model.layers.92.mlp.experts.48.gate_proj.weight', 'model.layers.92.mlp.experts.48.up_proj.weight', 'model.layers.92.mlp.experts.49.down_proj.weight', 'model.layers.92.mlp.experts.49.gate_proj.weight', 'model.layers.92.mlp.experts.49.up_proj.weight', 'model.layers.92.mlp.experts.5.down_proj.weight', 'model.layers.92.mlp.experts.5.gate_proj.weight', 'model.layers.92.mlp.experts.5.up_proj.weight', 'model.layers.92.mlp.experts.50.down_proj.weight', 'model.layers.92.mlp.experts.50.gate_proj.weight', 'model.layers.92.mlp.experts.50.up_proj.weight', 'model.layers.92.mlp.experts.51.down_proj.weight', 'model.layers.92.mlp.experts.51.gate_proj.weight', 'model.layers.92.mlp.experts.51.up_proj.weight', 'model.layers.92.mlp.experts.52.down_proj.weight', 'model.layers.92.mlp.experts.52.gate_proj.weight', 'model.layers.92.mlp.experts.52.up_proj.weight', 'model.layers.92.mlp.experts.53.down_proj.weight', 'model.layers.92.mlp.experts.53.gate_proj.weight', 'model.layers.92.mlp.experts.53.up_proj.weight', 'model.layers.92.mlp.experts.54.down_proj.weight', 'model.layers.92.mlp.experts.54.gate_proj.weight', 'model.layers.92.mlp.experts.54.up_proj.weight', 'model.layers.92.mlp.experts.55.down_proj.weight', 'model.layers.92.mlp.experts.55.gate_proj.weight', 'model.layers.92.mlp.experts.55.up_proj.weight', 'model.layers.92.mlp.experts.56.down_proj.weight', 'model.layers.92.mlp.experts.56.gate_proj.weight', 'model.layers.92.mlp.experts.56.up_proj.weight', 'model.layers.92.mlp.experts.57.down_proj.weight', 'model.layers.92.mlp.experts.57.gate_proj.weight', 'model.layers.92.mlp.experts.57.up_proj.weight', 'model.layers.92.mlp.experts.58.down_proj.weight', 'model.layers.92.mlp.experts.58.gate_proj.weight', 'model.layers.92.mlp.experts.58.up_proj.weight', 'model.layers.92.mlp.experts.59.down_proj.weight', 'model.layers.92.mlp.experts.59.gate_proj.weight', 'model.layers.92.mlp.experts.59.up_proj.weight', 'model.layers.92.mlp.experts.6.down_proj.weight', 'model.layers.92.mlp.experts.6.gate_proj.weight', 'model.layers.92.mlp.experts.6.up_proj.weight', 'model.layers.92.mlp.experts.60.down_proj.weight', 'model.layers.92.mlp.experts.60.gate_proj.weight', 'model.layers.92.mlp.experts.60.up_proj.weight', 'model.layers.92.mlp.experts.61.down_proj.weight', 'model.layers.92.mlp.experts.61.gate_proj.weight', 'model.layers.92.mlp.experts.61.up_proj.weight', 'model.layers.92.mlp.experts.62.down_proj.weight', 'model.layers.92.mlp.experts.62.gate_proj.weight', 'model.layers.92.mlp.experts.62.up_proj.weight', 'model.layers.92.mlp.experts.63.down_proj.weight', 'model.layers.92.mlp.experts.63.gate_proj.weight', 'model.layers.92.mlp.experts.63.up_proj.weight', 'model.layers.92.mlp.experts.64.down_proj.weight', 'model.layers.92.mlp.experts.64.gate_proj.weight', 'model.layers.92.mlp.experts.64.up_proj.weight', 'model.layers.92.mlp.experts.65.down_proj.weight', 'model.layers.92.mlp.experts.65.gate_proj.weight', 'model.layers.92.mlp.experts.65.up_proj.weight', 'model.layers.92.mlp.experts.66.down_proj.weight', 'model.layers.92.mlp.experts.66.gate_proj.weight', 'model.layers.92.mlp.experts.66.up_proj.weight', 'model.layers.92.mlp.experts.67.down_proj.weight', 'model.layers.92.mlp.experts.67.gate_proj.weight', 'model.layers.92.mlp.experts.67.up_proj.weight', 'model.layers.92.mlp.experts.68.down_proj.weight', 'model.layers.92.mlp.experts.68.gate_proj.weight', 'model.layers.92.mlp.experts.68.up_proj.weight', 'model.layers.92.mlp.experts.69.down_proj.weight', 'model.layers.92.mlp.experts.69.gate_proj.weight', 'model.layers.92.mlp.experts.69.up_proj.weight', 'model.layers.92.mlp.experts.7.down_proj.weight', 'model.layers.92.mlp.experts.7.gate_proj.weight', 'model.layers.92.mlp.experts.7.up_proj.weight', 'model.layers.92.mlp.experts.70.down_proj.weight', 'model.layers.92.mlp.experts.70.gate_proj.weight', 'model.layers.92.mlp.experts.70.up_proj.weight', 'model.layers.92.mlp.experts.71.down_proj.weight', 'model.layers.92.mlp.experts.71.gate_proj.weight', 'model.layers.92.mlp.experts.71.up_proj.weight', 'model.layers.92.mlp.experts.72.down_proj.weight', 'model.layers.92.mlp.experts.72.gate_proj.weight', 'model.layers.92.mlp.experts.72.up_proj.weight', 'model.layers.92.mlp.experts.73.down_proj.weight', 'model.layers.92.mlp.experts.73.gate_proj.weight', 'model.layers.92.mlp.experts.73.up_proj.weight', 'model.layers.92.mlp.experts.74.down_proj.weight', 'model.layers.92.mlp.experts.74.gate_proj.weight', 'model.layers.92.mlp.experts.74.up_proj.weight', 'model.layers.92.mlp.experts.75.down_proj.weight', 'model.layers.92.mlp.experts.75.gate_proj.weight', 'model.layers.92.mlp.experts.75.up_proj.weight', 'model.layers.92.mlp.experts.76.down_proj.weight', 'model.layers.92.mlp.experts.76.gate_proj.weight', 'model.layers.92.mlp.experts.76.up_proj.weight', 'model.layers.92.mlp.experts.77.down_proj.weight', 'model.layers.92.mlp.experts.77.gate_proj.weight', 'model.layers.92.mlp.experts.77.up_proj.weight', 'model.layers.92.mlp.experts.78.down_proj.weight', 'model.layers.92.mlp.experts.78.gate_proj.weight', 'model.layers.92.mlp.experts.78.up_proj.weight', 'model.layers.92.mlp.experts.79.down_proj.weight', 'model.layers.92.mlp.experts.79.gate_proj.weight', 'model.layers.92.mlp.experts.79.up_proj.weight', 'model.layers.92.mlp.experts.8.down_proj.weight', 'model.layers.92.mlp.experts.8.gate_proj.weight', 'model.layers.92.mlp.experts.8.up_proj.weight', 'model.layers.92.mlp.experts.80.down_proj.weight', 'model.layers.92.mlp.experts.80.gate_proj.weight', 'model.layers.92.mlp.experts.80.up_proj.weight', 'model.layers.92.mlp.experts.81.down_proj.weight', 'model.layers.92.mlp.experts.81.gate_proj.weight', 'model.layers.92.mlp.experts.81.up_proj.weight', 'model.layers.92.mlp.experts.82.down_proj.weight', 'model.layers.92.mlp.experts.82.gate_proj.weight', 'model.layers.92.mlp.experts.82.up_proj.weight', 'model.layers.92.mlp.experts.83.down_proj.weight', 'model.layers.92.mlp.experts.83.gate_proj.weight', 'model.layers.92.mlp.experts.83.up_proj.weight', 'model.layers.92.mlp.experts.84.down_proj.weight', 'model.layers.92.mlp.experts.84.gate_proj.weight', 'model.layers.92.mlp.experts.84.up_proj.weight', 'model.layers.92.mlp.experts.85.down_proj.weight', 'model.layers.92.mlp.experts.85.gate_proj.weight', 'model.layers.92.mlp.experts.85.up_proj.weight', 'model.layers.92.mlp.experts.86.down_proj.weight', 'model.layers.92.mlp.experts.86.gate_proj.weight', 'model.layers.92.mlp.experts.86.up_proj.weight', 'model.layers.92.mlp.experts.87.down_proj.weight', 'model.layers.92.mlp.experts.87.gate_proj.weight', 'model.layers.92.mlp.experts.87.up_proj.weight', 'model.layers.92.mlp.experts.88.down_proj.weight', 'model.layers.92.mlp.experts.88.gate_proj.weight', 'model.layers.92.mlp.experts.88.up_proj.weight', 'model.layers.92.mlp.experts.89.down_proj.weight', 'model.layers.92.mlp.experts.89.gate_proj.weight', 'model.layers.92.mlp.experts.89.up_proj.weight', 'model.layers.92.mlp.experts.9.down_proj.weight', 'model.layers.92.mlp.experts.9.gate_proj.weight', 'model.layers.92.mlp.experts.9.up_proj.weight', 'model.layers.92.mlp.experts.90.down_proj.weight', 'model.layers.92.mlp.experts.90.gate_proj.weight', 'model.layers.92.mlp.experts.90.up_proj.weight', 'model.layers.92.mlp.experts.91.down_proj.weight', 'model.layers.92.mlp.experts.91.gate_proj.weight', 'model.layers.92.mlp.experts.91.up_proj.weight', 'model.layers.92.mlp.experts.92.down_proj.weight', 'model.layers.92.mlp.experts.92.gate_proj.weight', 'model.layers.92.mlp.experts.92.up_proj.weight', 'model.layers.92.mlp.experts.93.down_proj.weight', 'model.layers.92.mlp.experts.93.gate_proj.weight', 'model.layers.92.mlp.experts.93.up_proj.weight', 'model.layers.92.mlp.experts.94.down_proj.weight', 'model.layers.92.mlp.experts.94.gate_proj.weight', 'model.layers.92.mlp.experts.94.up_proj.weight', 'model.layers.92.mlp.experts.95.down_proj.weight', 'model.layers.92.mlp.experts.95.gate_proj.weight', 'model.layers.92.mlp.experts.95.up_proj.weight', 'model.layers.92.mlp.experts.96.down_proj.weight', 'model.layers.92.mlp.experts.96.gate_proj.weight', 'model.layers.92.mlp.experts.96.up_proj.weight', 'model.layers.92.mlp.experts.97.down_proj.weight', 'model.layers.92.mlp.experts.97.gate_proj.weight', 'model.layers.92.mlp.experts.97.up_proj.weight', 'model.layers.92.mlp.experts.98.down_proj.weight', 'model.layers.92.mlp.experts.98.gate_proj.weight', 'model.layers.92.mlp.experts.98.up_proj.weight', 'model.layers.92.mlp.experts.99.down_proj.weight', 'model.layers.92.mlp.experts.99.gate_proj.weight', 'model.layers.92.mlp.experts.99.up_proj.weight', 'model.layers.92.mlp.gate.e_score_correction_bias', 'model.layers.92.mlp.gate.weight', 'model.layers.92.mlp.shared_experts.down_proj.weight', 'model.layers.92.mlp.shared_experts.gate_proj.weight', 'model.layers.92.mlp.shared_experts.up_proj.weight', 'model.layers.92.post_attention_layernorm.weight', 'model.layers.92.self_attn.k_norm.weight', 'model.layers.92.self_attn.k_proj.bias', 'model.layers.92.self_attn.k_proj.weight', 'model.layers.92.self_attn.o_proj.weight', 'model.layers.92.self_attn.q_norm.weight', 'model.layers.92.self_attn.q_proj.bias', 'model.layers.92.self_attn.q_proj.weight', 'model.layers.92.self_attn.v_proj.bias', 'model.layers.92.self_attn.v_proj.weight', 'model.layers.92.shared_head.norm.weight']
- This IS expected if you are initializing Glm4MoeForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Glm4MoeForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
The module name  (originally ) is not a valid Python identifier. Please rename the original module to avoid import issues.
๐Ÿ“Š Loading dataset: HuggingFaceH4/ultrachat_200k
โœ… Dataset prepared with 512 samples
โš™๏ธ  Setting up W4A16 quantization recipe...
๐Ÿ”ง Ignoring the following patterns from quantization:
       lm_head
       re:.*\.mlp\.gate$
       re:.*\.self_attn\..*$
       re:.*\.shared_expert\..*$
       re:.*\.shared_experts\..*$
       re:.*\.mlp\.shared_expert_gate$
       re:.*\.linear_attn\..*$
   ๐Ÿ†• re:model\.layers\.[0-2]\.mlp\..*$
๐ŸŽฏ Starting one-shot quantization...
2025-11-25T10:29:31.549728+0800 | reset | INFO - Compression lifecycle reset
2025-11-25T10:29:31.644540+0800 | _create_default_logger | INFO - Logging all LLM Compressor modifier-level logs to sparse_logs/25-11-2025_10.29.31.log
2025-11-25T10:29:31.644825+0800 | from_modifiers | INFO - Creating recipe from modifiers
2025-11-25T10:29:39.480906+0800 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
2025-11-25T10:29:39.481037+0800 | IndependentPipeline | INFO - Inferred `SequentialPipeline` for `GPTQModifier`
Preparing cache: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 512/512 [00:01<00:00, 270.65it/s]
(1/93): Calibrating: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 512/512 [00:07<00:00, 68.23it/s]
(1/93): Propagating: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 512/512 [00:09<00:00, 53.89it/s]
(2/93): Calibrating: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 512/512 [00:07<00:00, 68.92it/s]
(2/93): Propagating: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 512/512 [00:08<00:00, 59.74it/s]
(3/93): Calibrating: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 512/512 [00:07<00:00, 68.34it/s]
(3/93): Propagating: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 512/512 [00:07<00:00, 64.90it/s]
(4/93): Calibrating:   0%|                                                                                                                                                                 | 0/512 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/pipelines/sequential/helpers.py", line 73, in forward
    outputs = forward_fn(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 5, in forward
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/transformers/modeling_layers.py", line 94, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/transformers/models/glm4_moe/modeling_glm4_moe.py", line 395, in forward
    hidden_states = self.mlp(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/transformers/models/glm4_moe/modeling_glm4_moe.py", line 345, in forward
    hidden_states = self.moe(hidden_states, topk_indices, topk_weights).view(*orig_shape)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/transformers/models/glm4_moe/modeling_glm4_moe.py", line 331, in moe
    expert_output = expert(expert_input)
                    ^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/transformers/models/glm4_moe/modeling_glm4_moe.py", line 223, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
                                                                ^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1879, in _call_impl
    return inner()
           ^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1840, in inner
    hook_result = hook(self, args, result)
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/modifiers/utils/hooks.py", line 93, in wrapped_hook
    return hook(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/modifiers/quantization/gptq/base.py", line 230, in calibrate_module
    self._hessians[module] = make_empty_hessian(module, device=init_device)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/modifiers/quantization/gptq/gptq_quantize.py", line 30, in make_empty_hessian
    return torch.zeros((num_columns, num_columns), device=device, dtype=GPTQ_PRECISION)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB. GPU 0 has a total capacity of 23.57 GiB of which 15.19 MiB is free. Process 1797596 has 254.00 MiB memory in use. Including non-PyTorch memory, this process has 23.28 GiB memory in use. Of the allocated memory 22.41 GiB is allocated by PyTorch, and 580.02 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/work/ktransformers/ktransformers/kt-kernel/scripts/convert_gpu_weights.py", line 450, in <module>
    main()
  File "/work/ktransformers/ktransformers/kt-kernel/scripts/convert_gpu_weights.py", line 434, in main
    oneshot(
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/entrypoints/oneshot.py", line 330, in oneshot
    one_shot()
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/entrypoints/oneshot.py", line 158, in __call__
    self.apply_recipe_modifiers(
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/entrypoints/oneshot.py", line 201, in apply_recipe_modifiers
    pipeline(
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/pipelines/independent/pipeline.py", line 45, in __call__
    pipeline(model, dataloader, dataset_args)
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/pipelines/sequential/pipeline.py", line 104, in __call__
    subgraph.forward(model, **inputs)
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/llmcompressor/pipelines/sequential/helpers.py", line 75, in forward
    raise RuntimeError(
RuntimeError: Raised an exception during execution of the following code:

1 
2 
3 
4 def forward(self, wrapped_5, model_layers_2, getitem_1, model_rotary_emb, getitem_3):
5     model_layers_3 = getattr(self.model.layers, "3")(model_layers_2, attention_mask = wrapped_5, position_ids = getitem_3, past_key_values = None, cache_position = getitem_1, position_embeddings = model_rotary_emb);  model_layers_2 = wrapped_5 = getitem_3 = getitem_1 = model_rotary_emb = None
6     return {'model_layers_3': model_layers_3}
7     

CodeZ-Hao avatar Nov 25 '25 02:11 CodeZ-Hao