duyanyao issues

Results 2 issues of


                                            duyanyao

intel-extension-for-pytorch 2.1.0.dev+cpu.llm experimental rehabilitation

### Describe the issue Hello, I recently in emersion llama experiment (https://intel.github.io/intel-extension-for-pytorch/llm/cpu/), hope in int8 quantization can reach this article(https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-llama2-ai-hardware-sw-optimizations.html) mentioned in the 35 ms/token. At present, I am using...

CPU

Query

LLM

when i run intel_extension_for_pytorch.quantization import prepare prepare(user_model.eval(), qconfig, example_inputs=example_inputs) killed

### Describe the issue Now I'm replicating this [implementation，](https://intel.github.io/intel-extension-for-pytorch/llm/cpu/#compile-from-source) pytorch=2.1.0.dev20230711+cpu intel-extension-for-pytorch=2.1.0.dev0+cpu.llm but an error occurred while executing the step ## Llama 2 quantization python run_llama_int8.py --ipex-smooth-quant --lambada --output-dir "saved_results" --jit...

CPU

LLM