grandxin comments

Results 7 comments of


                                            grandxin

Qwen1.5-4b and Qwen1.5-7b model cannot be loaded correctly in ipex-llm version 20240522

> we have made some breaking change on qwen-1.5's int4 checkpoint in 5.21 version, old int4 checkpoint(generated by ipex 0520 or eariler) cannot be loaded with new ipex-llm(0521 or later),...

Qwen1.5-4b and Qwen1.5-7b model cannot be loaded correctly in ipex-llm version 20240522

> > ok, got it. > > the new version has some improvements? such as quantization accuracy, or RAM? > > yes, there should be some improvements on speed and...

Qwen1.5-4b and Qwen1.5-7b model cannot be loaded correctly in ipex-llm version 20240522

> > I regenerate qwen-7b int4 model and run it on my laptop(ultra 7 155H), but the "warm up" stage costs very long time(more than 5 minutes), do you have...

Qwen1.5-4b and Qwen1.5-7b model cannot be loaded correctly in ipex-llm version 20240522

> > I found that warm up speed is much faster in cpu mode(about 10-20s). but slower in xpu mode.. > > CPU doesn't need JIT compilation, while gpu needs....

MTL Windows Qwen-VL AttributeError: 'QWenAttention' object has no attribute 'position_ids'

i have same problem today. has this bug been fixed？

Question about benchmark result

have you solved this problem? I also run the qwen2-7b(int4) example using NPU. Inference speed is too slow, only 2-3 tokens/s.

Result is wrong when running Qwen2-1.5B-Instruct on Intel NPU

> Hi, @grandxin , I could not reproduce such error on MTL with `32.0.100.2540` driver. > > By using `ipex-llm==2.1.0b20240814`, the output of `Qwen2-1.5B-Instruct` with `load_low_bit=sym_int4` is > > ```shell...