wluo1007
wluo1007
Target Platform: MTL (Core ultra 7 165H) issue: codeqwen-1_5-7b-chat-q4_k_m.gguf using ipex-llm as backend for llama.cpp has performance gap compared with pytorch. minimum throughput requirement: >15 tokens/s ideal throughput requirement: >...
Below are the benchmark results on both THUDM/chatglm3-6b and openbmb/MiniCPM-2B-sft-bf16, from which we can see that chatglm3-6b has better throughput than miniCPM-2b. Considering MiniCPM-2b is a 2B model while chatglm3-6b...
Hi, When running python/llm/example/GPU/HuggingFace/LLM/codeshell/server.py python server.py --checkpoint-path /home/user/Qwen2-7B-Instruct --device xpu --multi-turn --max-context 1024 40+ python processes would be running, ps aux | grep "python"  In the code, the uvicorn...
Platform: Core ultra 7 165H iGPU Model: Qwen/Qwen2-7B-Instruct Following the steps on https://testbigdldocshane.readthedocs.io/en/perf-docs/doc/LLM/Quickstart/vLLM_quickstart.html# when running python offline_inference.py, error would ocurr: (vllm_ipex_env) user@user-Meteor-Lake-Client-Platform:~/vllm$ python offline_inference.py /home/user/vllm_ipex_env/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated...
openbmb/MiniCPM-2B-sft-bf16 on Text Generation WebUI issue on ubuntu22.04: set up the webUI following similar steps as window described here: https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/webui_quickstart.md when loading the model, error would show as below:  In the attachment, convert_ipex_model.py is for converting the glm-4v-9b model to low bit model and save to local dir. generate_glm4v_xpu.py is for...