binbin Deng

Results 28 comments of binbin Deng

Hi, to run neural-chat 7b inference using DeepSpeed AutoTP and our low-bit optimization, you could follow these steps: 1) Prepare your environment following [installation steps](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/Deepspeed-AutoTP#1-install). Especially for neural-chat-7b model, you...

Device 0 and 1 are used by default in our script. Please refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/KeyFeatures/multi_gpus_selection.html) for more details about how to select devices. According to my experiment on 2 A770,...

> We are fixing this in PR: #10051 This bug is fixed, and please use the nightly version of tomorrow and later. Thanks for reporting this issue @Kailuo-Lai !

English version was updated. Some changes also need to apply on Chinese version to align.

Maybe we could merge this first and refactor the model implementation to source code later.

Hi, @kevin-t-tang , I could reproduce this error of Qwen-7B-Chat and Qwen-14B-Chat using AutoTP, and this PR(https://github.com/intel-analytics/ipex-llm/pull/10766) could fix it. Please have a try after this PR is merged to...

Hi, @raj-ritu17 , I have reproduced error during merging model. We will try to fix it, update here once it is solved.

Hi, @raj-ritu17 . We have fixed this bug. Please install the latest ipex-llm (2.1.0b20240527), no need to modify utils code and just run [this script](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/export_merged_model.py) to merge model. According to...

Hi, @raj-ritu17 Could you please try the latest ipex-llm (2.1.0b20240527) and merge the adapter to original model as we discussed in https://github.com/intel-analytics/ipex-llm/issues/11135 ? Then you could use the merged model...