Ai-ZL
Ai-ZL
When I try the quantization step in this code, it cannot continue to run. An error occurs: CUDA out of memory. What can I do to solve this issue in...
Op (Softmax) [ShapeInferenceError] 'axis' must be in [0 , -1]. Its actual value is: -1 (Softmax-13)
# Bug Report ### Describe the bug I use quantize_static and convert_float_to_float16 functions in onnx to convert fp32 model to fp16 + int8 model. The fp16 model can inference through...
Hello! I noticed that quantization was used in the article https://arxiv.org/pdf/[2102.01547](https://arxiv.org/pdf/2102.01547). Could you please tell me what quantization method was used? Additionally, I would like to ask if there are...
Hello, can llmc support Whisper model quantization? Or what modifications need to be made to llmc to support quantization of the Whisper model?