grandxin
grandxin
> we have made some breaking change on qwen-1.5's int4 checkpoint in 5.21 version, old int4 checkpoint(generated by ipex 0520 or eariler) cannot be loaded with new ipex-llm(0521 or later),...
> > ok, got it. > > the new version has some improvements? such as quantization accuracy, or RAM? > > yes, there should be some improvements on speed and...
> > I regenerate qwen-7b int4 model and run it on my laptop(ultra 7 155H), but the "warm up" stage costs very long time(more than 5 minutes), do you have...
> > I found that warm up speed is much faster in cpu mode(about 10-20s). but slower in xpu mode.. > > CPU doesn't need JIT compilation, while gpu needs....
i have same problem today. has this bug been fixed?
have you solved this problem? I also run the qwen2-7b(int4) example using NPU. Inference speed is too slow, only 2-3 tokens/s.
> Hi, @grandxin , I could not reproduce such error on MTL with `32.0.100.2540` driver. > > By using `ipex-llm==2.1.0b20240814`, the output of `Qwen2-1.5B-Instruct` with `load_low_bit=sym_int4` is > > ```shell...