yangluchina issues

Repositories
Issues
Comments

Results 2 issues of


                                            yangluchina

Arc770 IPEX-LLM 的性能问题

在测试大模型的过程中，单卡性能符合预期，Arc770 双卡和4卡的场景中性能下降比较严重。 input token -> output token: 1024->512 按照这里文档 build： https://github.com/intel-analytics/ipex-llm/blob/main/docs/readthedocs/source/doc/LLM/DockerGuides/vllm_docker_quickstart.md IPEX-LLM 版本2.1.0b2

user issue

Arc770 IPEX-LLM 的交互准确性问题

客户在 Xeon-W 一机4卡 Arc770 的环境下验证，ipex-llm 版本 2.1.0b2 问题： 1. 用benchmark跑的时候已经趋于正常，但是直接调用的时候，有一定的概率没有输出，尤其是加了问号？很大概率就没有输出了调用样例： time curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Llama-2-13b-chat-hf", "prompt": "交朋友的原则是什么？", "max_tokens": 1024, "temperature": 0.9 }' 2....

user issue

multi-arc