auto-round
auto-round copied to clipboard
xpu does not support fp8 model as input due to triton
xpu evn has its own triton
does not install triton
2025-09-17 02:10:27 INFO llm.py L488: start to quantize Qwen/Qwen3-0.6B-FP8
2025-09-17 02:10:28 WARNING modeling_utils.py L4793: `torch_dtype` is deprecated! Use `dtype` instead!
2025-09-17 02:10:28 WARNING quantizer_finegrained_fp8.py L61: You have loaded an FP8 model on CPU and have a CUDA or XPU device available, make sure to set your model on a GPU or XPU device in order to run your model. To remove this warning, pass device_map = 'cuda' or 'xpu'.
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.06G/1.06G [00:02<00:00, 432MB/s]
Traceback (most recent call last):
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/gta/wenhuach/auto-round/auto_round/__main__.py", line 104, in <module>
run()
File "/home/gta/wenhuach/auto-round/auto_round/__main__.py", line 59, in run
tune(args)
File "/home/gta/wenhuach/auto-round/auto_round/script/llm.py", line 502, in tune
model, tokenizer, low_cpu_mem_usage = llm_load_model(
File "/home/gta/wenhuach/auto-round/auto_round/utils.py", line 1435, in llm_load_model
model = model_cls.from_pretrained(
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained
return model_class.from_pretrained(
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/modeling_utils.py", line 288, in _wrapper
return func(*args, **kwargs)
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5130, in from_pretrained
hf_quantizer.preprocess_model(
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/quantizers/base.py", line 238, in preprocess_model
return self._process_model_before_weight_loading(model, **kwargs)
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/quantizers/quantizer_finegrained_fp8.py", line 167, in _process_model_before_weight_loading
from ..integrations.finegrained_fp8 import replace_with_fp8_linear
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/integrations/finegrained_fp8.py", line 36, in <module>
@triton.jit
AttributeError: module 'triton' has no attribute 'jit'
install triton
Traceback (most recent call last):
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/gta/wenhuach/auto-round/auto_round/__main__.py", line 104, in <module>
run()
File "/home/gta/wenhuach/auto-round/auto_round/__main__.py", line 59, in run
tune(args)
File "/home/gta/wenhuach/auto-round/auto_round/script/llm.py", line 502, in tune
model, tokenizer, low_cpu_mem_usage = llm_load_model(
File "/home/gta/wenhuach/auto-round/auto_round/utils.py", line 1435, in llm_load_model
model = model_cls.from_pretrained(
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained
return model_class.from_pretrained(
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/modeling_utils.py", line 288, in _wrapper
return func(*args, **kwargs)
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5130, in from_pretrained
hf_quantizer.preprocess_model(
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/quantizers/base.py", line 238, in preprocess_model
return self._process_model_before_weight_loading(model, **kwargs)
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/quantizers/quantizer_finegrained_fp8.py", line 167, in _process_model_before_weight_loading
from ..integrations.finegrained_fp8 import replace_with_fp8_linear
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/integrations/finegrained_fp8.py", line 24, in <module>
import triton
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/__init__.py", line 8, in <module>
from .runtime import (
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/runtime/__init__.py", line 1, in <module>
from .autotuner import (Autotuner, Config, Heuristics, autotune, heuristics)
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 9, in <module>
from .jit import KernelInterface
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/runtime/jit.py", line 12, in <module>
from ..runtime.driver import driver
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/runtime/driver.py", line 1, in <module>
from ..backends import backends
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/backends/__init__.py", line 50, in <module>
backends = _discover_backends()
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/backends/__init__.py", line 43, in _discover_backends
compiler = _load_module(name, os.path.join(root, name, 'compiler.py'))
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/backends/__init__.py", line 12, in _load_module
spec.loader.exec_module(module)
File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/backends/intel/compiler.py", line 2, in <module>
from triton._C.libtriton import ir, passes, llvm, intel
ImportError: cannot import name 'intel' from 'triton._C.libtriton' (/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/_C/libtriton.so)
Hi. do we have a plan to fix it? now if we install torch =2.8.0+xpu or torch=2.9.0+xpu, it will Automatically install pytorch-triton-xpu, if i use auto-round, get WARNING:
2025-11-26 09:42:20 WARNING _logger.py L68: AutoScheme is currently supported only on Linux.
正在加载 2-Bit 模型: D:\StreamingMedia\quantize\2bits\autoround\Qwen3-0.6B_2bit_parquet
2025-11-26 09:42:26 WARNING _logger.py L68: Better backend is found, please install all the following requirements to enable it.
2025-11-26 09:42:26 WARNING _logger.py L68: `pip install "triton>=2.0"`
we will provide our own kernel to support it in the next release, ETA in 3 weeks.