Yang Wang comments

Results 45 comments of


                                            Yang Wang

Flex 170 x8 is failing when targeting 6 or 8 GPUs

> Hi Yang, going back to 8 GPUs on Flex with 32 attention head number, I reran on the same platform and verified this info when i did print(model) --...

Flex 170 x8 is failing when targeting 6 or 8 GPUs

> Hi Yang, please see attached output text file-- 8GPUs_llama2_7B.txt [8GPUs_llama2_7B.txt](https://github.com/intel-analytics/ipex-llm/files/14910582/8GPUs_llama2_7B.txt) @gbertulf if you are loading the model in FP32, it could be the case that all 8 model are...

Flex 170 x8 is failing when targeting 6 or 8 GPUs

> Hi Yang, from our debug synch you indicated that on the same machine your fellow team member were not seeing issues on 8-GPU config. May I kindly ask for...

LLM: Add Pipeline-Parallel-FastAPI example

Maybe we should wrap the model implementation in IPEX-LLM package and only expose an API for users to call in the example. e.g. in the example code: for running offline...

CPU memory leak lora finetuning on XPU

> @yangw1234 were you able to find a solution? We just restart the finetuning process after CPU OOM, which, hopefully, is not very frequent.