Vishal Jain
Results
1
issues of
Vishal Jain
### Describe the issue I am trying to quantize and run `Llama-2-7b-hf` model using the example [here](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/language_model/llama/weight_only_quant). I was able to successfully generate the `int4` model with GPTQ quantization by...
release:1.17.0