duyanyao
duyanyao
### Describe the issue Hello, I recently in emersion llama experiment (https://intel.github.io/intel-extension-for-pytorch/llm/cpu/), hope in int8 quantization can reach this article(https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-llama2-ai-hardware-sw-optimizations.html) mentioned in the 35 ms/token. At present, I am using...
### Describe the issue Now I'm replicating this [implementation,](https://intel.github.io/intel-extension-for-pytorch/llm/cpu/#compile-from-source) pytorch=2.1.0.dev20230711+cpu intel-extension-for-pytorch=2.1.0.dev0+cpu.llm but an error occurred while executing the step ## Llama 2 quantization python run_llama_int8.py --ipex-smooth-quant --lambada --output-dir "saved_results" --jit...