Mike Yang

Results 8 issues of Mike Yang

### Describe the bug run the test script intel-extension-for-pytorch/examples/gpu/inference/python/llm/run_benchmark.sh It will have the following error Namespace(model_id='/home/llm/disk/llm/meta-llama/Llama-2-7b-hf', sub_model_name='llama2-7b', device='xpu', dtype='float16', input_tokens='1024', max_new_tokens=128, prompt=None, greedy=False, ipex=True, jit=False, profile=False, benchmark=True, lambada=False, dataset='lambada', num_beams=4,...

XPU/GPU
LLM

python/llm/example/GPU/Deepspeed-AutoTP/run_qwen_14b_arc_2_card.sh python/llm/example/GPU/Deepspeed-AutoTP/run_vicuna_33b_arc_2_card.sh python/llm/dev/benchmark/all-in-one/run-deepspeed-arc.sh Current the following code only enable on the Intel Core CPU. But on Intel Xeon CPU, also need enable the SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS to improve performance. ``` if grep...

user issue

The current all-in-one benchmark save the csv file with the name only have date. if we run multi test in the same day, the older test data will be overwrite...

user issue

Every time when I run the test, it will load the original model and covert to lower bit. If we load a 34B model on 4 ARC card, it will...

user issue

(ipex-llm-0812) llm@GPU-Xeon4410Y-ARC770:~/ipex-llm-0812/python/llm/dev/benchmark/all-in-one$ bash run-deepspeed-arc.sh :: initializing oneAPI environment ... run-deepspeed-arc.sh: BASH_VERSION = 5.1.16(1)-release args: Using "$@" for oneapi-vars.sh arguments: --force :: advisor -- processing etc/advisor/vars.sh :: ccl -- processing etc/ccl/vars.sh...

user issue

With the ipex-llm docker container, intelanalytics/ipex-llm-serving-vllm-xpu-experiment:2.1.0b2 it successfully load model in 4 ARC. But when load model in 8 ARC, it will have the following error. root@GPU-Xeon4410Y-ARC770:/llm# bash start-vllm-service.sh /usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13:...

user issue

### The vllm docker image is `intelanalytics/ipex-llm-serving-xpu-vllm-0.5.4-experimental:2.2.0b1` ### vLLM start command is 'model="/llm/models/Qwen2-72B-Instruct/" served_model_name="Qwen2-72B-Instruct" source /opt/intel/1ccl-wks/setvars.sh export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=2 python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \ --served-model-name $served_model_name \ --port 8000 \ --model $model...

user issue
multi-arc

**The vLLM docker image is** `intelanalytics/ipex-llm-serving-xpu-vllm-0.5.4-experimental:2.2.0b1` **vLLM start command is** `model="/llm/models/meta-llama/LLaMA-33B-HF/" served_model_name="LLaMA-33B-HF" source /opt/intel/1ccl-wks/setvars.sh export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=2 python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \ --served-model-name $served_model_name \ --port 8000 \ --model $model \ --trust-remote-code...

user issue
multi-arc