Wang, Jian4 issues

Results 10 issues of


                                            Wang, Jian4

[PPML Test]Just to test occlum team hostfs_bug_fix

## Description To test occlum team hostfs_bug_fix ### 1. Why the change? To test occlum team hostfs_bug_fix ### 2. User API changes No ### 3. Summary of the change change...

Install ipex main branch conda env will meet an error

### Describe the issue I follow [this readme](https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/inference/python/llm#environment-setup) to install conda env, and try to build docker or just install from conda. And I always meet an error. This is...

Bug

CPU

Compilation

LLM：new_llm compatibility mode test [example, unit, performance]

## Description Use ipex_llm to test ### 1. Why the change? ### 2. User API changes ### 3. Summary of the change ### 4. How to test? - [ ]...

LLM: Add QWen gguf

## Description Add QWen gguf support, [model path](https://huggingface.co/Lemmih/Qwen-GGUF/tree/main) ### 1. Why the change? Add QWen gguf ### 2. User API changes No ### 3. Summary of the change Add QWen...

[PPML] Test spr occlum start and exec

## Description Test spr occlum start and exec ### 1. Why the change? ### 2. User API changes ### 3. Summary of the change ### 4. How to test? -...

LLM: Optimize qwen1.5 moe model

## Description - [x] convert moe block - [x] RMSNorm llama_rms_norm_forward - [x] MLP llama_mlp_forward - [x] kv_cache transformers-v4.40.4 past_key_value.seen_tokens -> past_key_value._seen_tokens - [x] fused qkv - [x] fused rope...

LLM: Test build llm cpp_docker

## Description Test build llm cpp_docker ### 1. Why the change? ### 2. User API changes ### 3. Summary of the change ### 4. How to test? - [ ]...

Add chatchat and text-gen web-ui on xpu-serving images

## Description Add chatchat and text-gen web-ui on xpu-serving images ### 1. Why the change? ### 2. User API changes ### 3. Summary of the change ### 4. How to...

Test cpu offload

## Description Refer to https://github.com/analytics-zoo/vllm/blob/xiangyu_test_202411_0806/vllm/model_executor/models/utils.py#L76, enable not load model to xpu. Performance: Qwen1.5-32B 4card fp8 9k-512: | 0.5.4 | Next Token(ms)| | ---- | ---- | | before | 73.26...

Init xpu-tgi dockerfile

## Description Init xpu-tgi dockerfile ### 1. Why the change? ### 2. User API changes ### 3. Summary of the change ### 4. How to test? - [ ] N/A...