Wang, Jian4

Results 44 comments of Wang, Jian4

> > PPML-image uses gramine-v1.3.1 as a base image, and i think is won't so many changes in the 1.5 version. Perhaps you should install jupyter and jupyterlab libraries ,...

The heartbeat loss problem may be caused by THE `SGX_MEM_SIZE` is too low. And I have never met The randomly fit() time issue. I think this may caused by the...

We have verified 6k input/512 out with VLLM serving with ChatGLM3-bB on 2 ARC, Qwen1.5-32B on 4 ARC.

Refer to this [issue](https://github.com/THUDM/ChatGLM3/issues/1324). It seems that the transformers version 4.45.0 will meet this issue when running glm model. You can use transformers 4.37.0 to run first. ```bash pip install...

This feature will support in this [PR](https://github.com/intel-analytics/ipex-llm/pull/11703)

It seems that the `gpu-memory-utilization` is too high and causing the card 1 OOM when first_token is computed. You can reduce it to 0.85 can try ir again.

InternVL2-8B is enabled by [this pr]( https://github.com/analytics-zoo/vllm/pull/72/files).

This issue cannot be reproduced on Arc, only on MTL. This may be caused by different scheduling methods of different systems. And it was fixed by [this pr](https://github.com/intel-analytics/ipex-llm/pull/11822).

We haven't tested it on mtl iGPU before. I tried to reproduce it but encountered a different error. Maybe You can try it on docker according to [this docker guide](https://github.com/intel-analytics/ipex-llm/tree/main/docker/llm/serving/xpu/docker).

I follw [this guide](https://github.com/intel-analytics/Langchain-Chatchat/blob/ipex-llm/INSTALL_linux_xeon.md#) to set up ENV and [this guide](https://github.com/intel-analytics/ipex-llm/blob/main/docs/readthedocs/source/doc/LLM/Quickstart/chatchat_quickstart.md) to load [this public 92MB pdf](https://research.nhm.org/publications/pdfpick.html?id=37061&pdfroot=http://research.nhm.org/pdfs). And it does work normally. Maybe you could try to load this pdf...