AmberX comments

Results 6 comments of


                                            AmberX

Qwen issues with >2K input token size on MTL

I encountered the same Native API failed problem when input token size > 2000.

Qwen issues with >2K input token size on MTL

> Hi @AmberXu98 > > Do you encounter this error `Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES)` meaning out of memory? > > Also could you provide more information of your settings...

Qwen issues with >2K input token size on MTL

> Hi @AmberXu98 > > We are reproducing with your prompt again to double confirm it in our environment. > > By the way, want to confirm is it the...

Qwen issues with >2K input token size on MTL

> > > Hi @AmberXu98 > > > We are reproducing with your prompt again to double confirm it in our environment. > > > By the way, want to...

Qwen issues with >2K input token size on MTL

@hkvision @chtanch I encounter `Native API returns: -999 (Unknown PI error)`, how can I solve it? The bigdl-llm version on my machine is 2.5.0b20240219. Shorter inputs works, only longer inputs...

Qwen issues with >2K input token size on MTL

> @AmberXu98 to clarify, are you using [Qwen-7b-chat-int4](https://huggingface.co/Qwen/Qwen-7B-Chat-Int4)? Or the non-quantized models? Eg, [Qwen-7b](https://huggingface.co/Qwen/Qwen-7B) or [Qwen-7b-chat](https://huggingface.co/Qwen/Qwen-7B-Chat)? @chtanch non-quantized model **Qwen-7B-Chat** and load_in_4bit like this: `model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen-7B-Chat',load_in_4bit=True,optimize_model=True,trust_remote_code=True,use_cache=True,cpu_embedding=True).eval()`