Wang, Jian4

Results 10 issues of Wang, Jian4

## Description To test occlum team hostfs_bug_fix ### 1. Why the change? To test occlum team hostfs_bug_fix ### 2. User API changes No ### 3. Summary of the change change...

### Describe the issue I follow [this readme](https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/inference/python/llm#environment-setup) to install conda env, and try to build docker or just install from conda. And I always meet an error. This is...

Bug
CPU
Compilation

## Description Use ipex_llm to test ### 1. Why the change? ### 2. User API changes ### 3. Summary of the change ### 4. How to test? - [ ]...

## Description Add QWen gguf support, [model path](https://huggingface.co/Lemmih/Qwen-GGUF/tree/main) ### 1. Why the change? Add QWen gguf ### 2. User API changes No ### 3. Summary of the change Add QWen...

## Description Test spr occlum start and exec ### 1. Why the change? ### 2. User API changes ### 3. Summary of the change ### 4. How to test? -...

## Description - [x] convert moe block - [x] RMSNorm llama_rms_norm_forward - [x] MLP llama_mlp_forward - [x] kv_cache transformers-v4.40.4 past_key_value.seen_tokens -> past_key_value._seen_tokens - [x] fused qkv - [x] fused rope...

## Description Test build llm cpp_docker ### 1. Why the change? ### 2. User API changes ### 3. Summary of the change ### 4. How to test? - [ ]...

## Description Add chatchat and text-gen web-ui on xpu-serving images ### 1. Why the change? ### 2. User API changes ### 3. Summary of the change ### 4. How to...

## Description Refer to https://github.com/analytics-zoo/vllm/blob/xiangyu_test_202411_0806/vllm/model_executor/models/utils.py#L76, enable not load model to xpu. Performance: Qwen1.5-32B 4card fp8 9k-512: | 0.5.4 | Next Token(ms)| | ---- | ---- | | before | 73.26...

## Description Init xpu-tgi dockerfile ### 1. Why the change? ### 2. User API changes ### 3. Summary of the change ### 4. How to test? - [ ] N/A...