[BUG] RuntimeError: Numpy is not available
Describe the bug
INFO Packing model...
INFO Packing Kernel: Auto-selection: adding candidate `TorchQuantLinear`
INFO Kernel: candidates -> `[TorchQuantLinear]`
INFO Kernel: selected -> `TorchQuantLinear`.
Packing model.layers.0.mlp.gate_proj [5 of 224] █---------------------------------------------------------------| 0:00:00 / 0:00:00 [5/224] 2.2%Traceback (most recent call last):
File "/mnt/8tb_raid/david_model/GPTQModel/examples/quantization/quant_deepseek_autoround.py", line 79, in <module>
main()
File "/mnt/8tb_raid/david_model/GPTQModel/examples/quantization/quant_deepseek_autoround.py", line 43, in main
model.quantize(examples)
File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/models/base.py", line 421, in quantize
return module_looper.loop(
File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/looper/module_looper.py", line 441, in loop
reverse_p.finalize(model=self.gptq_model, **kwargs)
File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/looper/gptq_processor.py", line 200, in finalize
model.qlinear_kernel = pack_model(
File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/utils/model.py", line 592, in pack_model
for _ in executor.map(wrapper, names):
File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/utils/model.py", line 590, in wrapper
pack_module(name, qModules, quant_result, modules)
File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/utils/model.py", line 529, in pack_module
qModules[name].pack(linear=layers[name], scales=scale, zeros=zero, g_idx=g_idx)
File "/home/david/miniconda3/envs/gptqmodel/lib/python3.10/site-packages/gptqmodel/nn_modules/qlinear/__init__.py", line 469, in pack
int_weight = int_weight.numpy().astype(self.pack_np_math_dtype)
RuntimeError: Numpy is not available
I used the code from "/GPTQModel/examples/quantization/basic_usage_autoround.py" to quantize deepseek-ai/DeepSeek-R1-Distill-Llama-8B and Qwen/QwQ-32B, but I encountered the same issue in both cases.
GPU Info
Show output of:NVIDIA A6000
nvidia-smi
Software Info
CUDA Version: 12.8
Show output of:
Name: gptqmodel
Version: 2.0.1.dev0
---
Name: torch
Version: 2.2.0
---
Name: transformers
Version: 4.49.0
---
Name: accelerate
Version: 1.3.0
---
Name: triton
Version: 2.2.0
Name: numpy
Version:2.2.3
(gptqmodel) david@asus-ESC4000-E11:/mnt/8tb_raid/david_model/GPTQModel$ pip list
Package Version
------------------------ -----------
accelerate 1.3.0
aiohappyeyeballs 2.5.0
aiohttp 3.11.13
aiosignal 1.3.2
async-timeout 5.0.1
attrs 25.1.0
certifi 2025.1.31
charset-normalizer 3.4.1
datasets 3.3.2
device-smi 0.4.1
dill 0.3.8
filelock 3.17.0
frozenlist 1.5.0
fsspec 2024.12.0
gptqmodel 2.0.1.dev0
hf_transfer 0.1.9
huggingface-hub 0.29.2
idna 3.10
Jinja2 3.1.6
logbar 0.0.3
MarkupSafe 3.0.2
mpmath 1.3.0
multidict 6.1.0
multiprocess 0.70.16
networkx 3.4.2
numpy 2.2.3
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-cusparselt-cu12 0.6.2
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.8.93
nvidia-nvtx-cu12 12.1.105
packaging 24.2
pandas 2.2.3
pillow 11.1.0
pip 25.0
propcache 0.3.0
protobuf 6.30.0
psutil 7.0.0
pyarrow 19.0.1
python-dateutil 2.9.0.post0
pytz 2025.1
PyYAML 6.0.2
regex 2024.11.6
requests 2.32.3
safetensors 0.5.3
setuptools 75.8.0
six 1.17.0
sympy 1.13.1
threadpoolctl 3.5.0
tokenicer 0.0.4
tokenizers 0.21.0
torch 2.2.0
tqdm 4.67.1
transformers 4.49.0
triton 2.2.0
typing_extensions 4.12.2
tzdata 2025.1
urllib3 2.3.0
wheel 0.45.1
xxhash 3.5.0
yarl 1.18.3
my code:
# Copyright 2024-2025 ModelCloud.ai
# Copyright 2024-2025 [email protected]
# Contact: [email protected], x.com/qubitium
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import torch
from gptqmodel import GPTQModel
from gptqmodel.quantization.config import AutoRoundQuantizeConfig # noqa: E402
from transformers import AutoTokenizer
pretrained_model_id = "Qwen/QwQ-32B" # "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
quantized_model_id = "./autoround/Qwen-QwQ-32B-4bit-32g"
def main():
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_id, use_fast=True)
examples = [
tokenizer(
"gptqmodel is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm."
)
]
quantize_config = AutoRoundQuantizeConfig(
bits=4,
group_size=32
)
model = GPTQModel.load(
pretrained_model_id,
quantize_config=quantize_config,
)
model.quantize(examples)
model.save(quantized_model_id)
tokenizer.save_pretrained(quantized_model_id)
del model
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = GPTQModel.from_quantized(
quantized_model_id,
device=device,
)
input_ids = torch.ones((1, 1), dtype=torch.long, device=device)
outputs = model(input_ids=input_ids)
print(f"output logits {outputs.logits.shape}: \n", outputs.logits)
# inference with model.generate
print(
tokenizer.decode(
model.generate(
**tokenizer("gptqmodel is", return_tensors="pt").to(model.device)
)[0]
)
)
if __name__ == "__main__":
import logging
logging.basicConfig(
format="%(asctime)s %(levelname)s [%(name)s] %(message)s",
level=logging.INFO,
datefmt="%Y-%m-%d %H:%M:%S",
)
main()
Thank you!!!!!!!!!!!!!!!!
@davidray222 Thanks for the report. I think your torch version 2.2 maybe too old. Or maybe I broke something in main. Will check soon!
@davidray222 I think you have a broken Numy pkg install. Try the following:
import torch
t = torch.tensor([1, 2, 3], dtype=torch.int32)
print(t.numpy())
Run the above python code in your env. If you get same error, I suggest you uninstall numpy and re-install numpy.
@davidray222 Also please use releae-version 2.0 if possible. There may be other bugs in the main/devel branch. There are some changes on main that have not been ci validated.
@Qubitium I successfully quantized DeepSeek-R1-Distill-Llama-8B into a 4-bit model, but I encountered an issue where, no matter what input I provide, the output is always "!!!!!!!!". Could this be due to a mistake in my process, or is it related to an issue with my environment or version compatibility?
I update Software version:
CUDA Version: 12.8
Show output of:
Name: gptqmodel
Version: 2.0.0
---
Name: torch
Version: 2.4.0
---
Name: transformers
Version: 4.49.0
---
Name: accelerate
Version: 1.3.0
---
Name: triton
Version: 3.0.0
---
Name: numpy
Version:2.2.2
my steps: 1. I used the code from "/GPTQModel/examples/quantization/basic_usage_autoround.py" to quantize deepseek-ai/DeepSeek-R1-Distill-Llama-8B. 2. I got these files
I use this code:
export GPTQMODEL_USE_MODELSCOPE=True
from gptqmodel import GPTQModel
# load Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4 from modelscope
model = GPTQModel.load("/mnt/8tb_raid/david_model/GPTQModel/examples/quantization/autoround/DeepSeek-R1-Distill-Llama-8B-4bit-32g/")
result = model.generate("hello")[0] # tokens
print(model.tokenizer.decode(result)) # string output
get the output always is
I would like to ask if I made a mistake somewhere.Thank you!!
I am getting a similar error:
ImportError Traceback (most recent call last) /usr/local/lib/python3.11/dist-packages/transformers/utils/import_utils.py in _get_module(self, module_name) 1862 try: -> 1863 return importlib.import_module("." + module_name, self.name) 1864 except Exception as e:
47 frames ImportError: cannot import name '_center' from 'numpy._core.umath' (/usr/local/lib/python3.11/dist-packages/numpy/_core/umath.py)
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last) RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback): cannot import name '_center' from 'numpy._core.umath' (/usr/local/lib/python3.11/dist-packages/numpy/_core/umath.py)
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last) /usr/local/lib/python3.11/dist-packages/transformers/utils/import_utils.py in _get_module(self, module_name) 1863 return importlib.import_module("." + module_name, self.name) 1864 except Exception as e: -> 1865 raise RuntimeError( 1866 f"Failed to import {self.name}.{module_name} because of the following error (look up to see its" 1867 f" traceback):\n{e}"
RuntimeError: Failed to import transformers.models.auto.tokenization_auto because of the following error (look up to see its traceback): Failed to import transformers.generation.utils because of the following error (look up to see its traceback): cannot import name '_center' from 'numpy._core.umath' (/usr/local/lib/python3.11/dist-packages/numpy/_core/umath.py)
from running
from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig
My library versions:
Name: gptqmodel
Version: 2.0.0
---
Name: torch
Version: 2.6.0+cu124
---
Name: transformers
Version: 4.49.0
---
Name: accelerate
Version: 1.3.0
---
Name: triton
Version: 3.2.0
---
Name: numpy
Version:2.2.4
When I downgrade numpy to 2.2.2, I get:
ImportError Traceback (most recent call last)
3 frames
/usr/local/lib/python3.11/dist-packages/gptqmodel/utils/model.py in
ImportError: cannot import name 'SUPPORTED_MODELS' from 'gptqmodel.models._const' (/usr/local/lib/python3.11/dist-packages/gptqmodel/models/_const.py)