Ruonan Wang
Ruonan Wang
## Description We need to align function in BaseForecaster and AutoformerForecaster synchronously. ### 1. Why the change? https://github.com/intel-analytics/BigDL/issues/5834 ### 2. User API changes ``` from bigdl.chronos.forecaster import AutoformerForecaster forecaster =...
## Improvements There are two parts need to be improved for InferenceOptimizer : - Reduce the time cost under the default parameters #5740 - Improve the output of optimize process...
## Background Nano currently provide `optimize` and `get_best_model` in InferenceOptimizer for our users to get an accelerated model with global minimum latency. However, in the actual use scenario, when :...
In my test, mobilevit_xs process one image will cost 113ms, which is much larger than the value in the paper.
Hi, when I use ipex quantization with inc, I meet a problem that quantized model can't be loaded after save. When I save, I just call `quantized.save(path)` and I get...
### Describe the bug I found that on PVC GPU, output of fp16 has randomness. The same llm model, if I run two times, the output will be different, below...
## Description This split chatglm3's mlp and use mlp fusion, which can has ~1ms on MTL. But quantize kv cache + mlp fusion will cause change of output on Arc...
### Describe the bug I found in jupyter notebook, `to('xpu')` makes the Jupyter kernel die. ### Notebook to reproduce  ```bash [1] import intel_extension_for_pytorch as ipex [2] from transformers import...
## Description update troubleshooting of llama.cpp ### 1. Why the change? https://github.com/intel-analytics/ipex-llm/issues/10989 ### 4. How to test? - [ ] Document test 
## Description ### 1. Why the change? https://github.com/analytics-zoo/nano/issues/1316#issuecomment-2076658639 ### 2. User API changes ```python model = AutoModelForCausalLM.from_pretrained(model_path, load_in_low_bit='gguf_q4k_m', optimize_model=True, torch_dtype=torch.float16, trust_remote_code=True, use_cache=True) ``` ```python model = AutoModelForCausalLM.from_pretrained(model_path, load_in_low_bit='gguf_q4k_s', optimize_model=True, torch_dtype=torch.float16,...