Yufeng Li

Results 73 comments of Yufeng Li

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal...

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal...

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

> Thank you very much for your response. > > I'm having another problem and I'm struggling to find the material to help me for solving my task. I'm trying...

> In my opinion the problem still should be fixed, especially we see the fix is there but for what reason we refuse doing so? Windows and Office are deployed...

Hi @VishalX, could you please try quantizing the model directly with command like: python -m onnxruntime.quantization.matmul_4bits_quantizer? And is your model a fine-tune model or the original llama2?

> Hey @yufenglee, I'm using original llama2: [meta-llama/Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b), exported to ONNX using below command. > > ```powershell > python -m onnxruntime.transformers.models.llama.convert_to_onnx -m meta-llama/Llama-2-7b-hf --output llama2-7b > ``` > > Let...

I can repro the issue locally. - For the Symmetric quantization, the "Hinweis: Die folgende Seite ist nur auf Englisch verfügbar." in the 1st prompt is German and means "Note:...