ipex-llm
ipex-llm copied to clipboard
Docker run , RuntimeError: UR error : /ipex-llm/python/llm/example/GPU/HuggingFace/Multimodal/internvl2
Describe the bug use docker : https://hub.docker.com/r/intelanalytics/ipex-llm-serving-xpu
How to reproduce Steps to reproduce the error:
cd /ipex-llm/python/llm/example/GPU/HuggingFace/Multimodal/internvl2
REPO_ID_OR_MODEL_PATH=/llm/models/OpenGVLab/InternVL2-1B/
N_PREDICT=200
IMAGE_URL_OR_PATH=/llm/models/demo.png
PROMPT="描述图片内容"
python ./chat.py --repo-id-or-model-path $REPO_ID_OR_MODEL_PATH --prompt $PROMPT --n-predict $N_PREDICT --image-url-or-path $IMAGE_URL_OR_PATH --modelscope
Screenshots
Environment information
Additional context
Hi @wmx-github
The error message is related to transformers version mismatch.
File "/usr/local/lib/python3.11/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 413, in forward
"full_attention": create_causal_mask(**mask_kwargs),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/transformers/masking_utils.py", line 753, in create_causal_mask
early_exit, attention_mask, packed_sequence_mask, kv_length, kv_offset = _preprocess_mask_arguments(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/transformers/masking_utils.py", line 688, in _preprocess_mask_arguments
attention_mask = attention_mask.to(device=cache_position.device, dtype=torch.bool)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: UR error
In your env transformers=4.53.1, can you downgrade it to 4.37.0
pip install transformers==4.37.0