auto-round icon indicating copy to clipboard operation
auto-round copied to clipboard

xpu does not support fp8 model as input due to triton

Open wenhuach21 opened this issue 3 months ago • 2 comments

xpu evn has its own triton

does not install triton

2025-09-17 02:10:27 INFO llm.py L488: start to quantize Qwen/Qwen3-0.6B-FP8
2025-09-17 02:10:28 WARNING modeling_utils.py L4793: `torch_dtype` is deprecated! Use `dtype` instead!
2025-09-17 02:10:28 WARNING quantizer_finegrained_fp8.py L61: You have loaded an FP8 model on CPU and have a CUDA or XPU device available, make sure to set your model on a GPU or XPU device in order to run your model. To remove this warning, pass device_map = 'cuda' or 'xpu'.
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.06G/1.06G [00:02<00:00, 432MB/s]
Traceback (most recent call last):
  File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/gta/wenhuach/auto-round/auto_round/__main__.py", line 104, in <module>
    run()
  File "/home/gta/wenhuach/auto-round/auto_round/__main__.py", line 59, in run
    tune(args)
  File "/home/gta/wenhuach/auto-round/auto_round/script/llm.py", line 502, in tune
    model, tokenizer, low_cpu_mem_usage = llm_load_model(
  File "/home/gta/wenhuach/auto-round/auto_round/utils.py", line 1435, in llm_load_model
    model = model_cls.from_pretrained(
  File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained
    return model_class.from_pretrained(
  File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/modeling_utils.py", line 288, in _wrapper
    return func(*args, **kwargs)
  File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5130, in from_pretrained
    hf_quantizer.preprocess_model(
  File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/quantizers/base.py", line 238, in preprocess_model
    return self._process_model_before_weight_loading(model, **kwargs)
  File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/quantizers/quantizer_finegrained_fp8.py", line 167, in _process_model_before_weight_loading
    from ..integrations.finegrained_fp8 import replace_with_fp8_linear
  File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/integrations/finegrained_fp8.py", line 36, in <module>
    @triton.jit
AttributeError: module 'triton' has no attribute 'jit'

install triton

Traceback (most recent call last):
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/runpy.py", line 196, in _run_module_as_main
   return _run_code(code, main_globals, None,
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/runpy.py", line 86, in _run_code
   exec(code, run_globals)
 File "/home/gta/wenhuach/auto-round/auto_round/__main__.py", line 104, in <module>
   run()
 File "/home/gta/wenhuach/auto-round/auto_round/__main__.py", line 59, in run
   tune(args)
 File "/home/gta/wenhuach/auto-round/auto_round/script/llm.py", line 502, in tune
   model, tokenizer, low_cpu_mem_usage = llm_load_model(
 File "/home/gta/wenhuach/auto-round/auto_round/utils.py", line 1435, in llm_load_model
   model = model_cls.from_pretrained(
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained
   return model_class.from_pretrained(
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/modeling_utils.py", line 288, in _wrapper
   return func(*args, **kwargs)
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5130, in from_pretrained
   hf_quantizer.preprocess_model(
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/quantizers/base.py", line 238, in preprocess_model
   return self._process_model_before_weight_loading(model, **kwargs)
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/quantizers/quantizer_finegrained_fp8.py", line 167, in _process_model_before_weight_loading
   from ..integrations.finegrained_fp8 import replace_with_fp8_linear
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/transformers/integrations/finegrained_fp8.py", line 24, in <module>
   import triton
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/__init__.py", line 8, in <module>
   from .runtime import (
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/runtime/__init__.py", line 1, in <module>
   from .autotuner import (Autotuner, Config, Heuristics, autotune, heuristics)
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 9, in <module>
   from .jit import KernelInterface
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/runtime/jit.py", line 12, in <module>
   from ..runtime.driver import driver
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/runtime/driver.py", line 1, in <module>
   from ..backends import backends
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/backends/__init__.py", line 50, in <module>
   backends = _discover_backends()
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/backends/__init__.py", line 43, in _discover_backends
   compiler = _load_module(name, os.path.join(root, name, 'compiler.py'))
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/backends/__init__.py", line 12, in _load_module
   spec.loader.exec_module(module)
 File "/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/backends/intel/compiler.py", line 2, in <module>
   from triton._C.libtriton import ir, passes, llvm, intel
ImportError: cannot import name 'intel' from 'triton._C.libtriton' (/home/gta/miniforge3/envs/hm_unitrace/lib/python3.10/site-packages/triton/_C/libtriton.so)

wenhuach21 avatar Sep 17 '25 02:09 wenhuach21

Hi. do we have a plan to fix it? now if we install torch =2.8.0+xpu or torch=2.9.0+xpu, it will Automatically install pytorch-triton-xpu, if i use auto-round, get WARNING:

2025-11-26 09:42:20 WARNING _logger.py L68: AutoScheme is currently supported only on Linux.
正在加载 2-Bit 模型: D:\StreamingMedia\quantize\2bits\autoround\Qwen3-0.6B_2bit_parquet
2025-11-26 09:42:26 WARNING _logger.py L68: Better backend is found, please install all the following requirements to enable it.
2025-11-26 09:42:26 WARNING _logger.py L68: `pip install "triton>=2.0"`

xiaohoua avatar Nov 26 '25 02:11 xiaohoua

we will provide our own kernel to support it in the next release, ETA in 3 weeks.

wenhuach21 avatar Nov 26 '25 02:11 wenhuach21