Ai-ZL issues

Results 4 issues of


                                            Ai-ZL

About CUDA out of memory

When I try the quantization step in this code, it cannot continue to run. An error occurs: CUDA out of memory. What can I do to solve this issue in...

question

Op (Softmax) [ShapeInferenceError] 'axis' must be in [0 , -1]. Its actual value is: -1 (Softmax-13)

# Bug Report ### Describe the bug I use quantize_static and convert_float_to_float16 functions in onnx to convert fp32 model to fp16 + int8 model. The fp16 model can inference through...

bug

What quantization method was used?

Hello! I noticed that quantization was used in the article https://arxiv.org/pdf/[2102.01547](https://arxiv.org/pdf/2102.01547). Could you please tell me what quantization method was used? Additionally, I would like to ask if there are...

Can llmc support Whisper model quantization?

Hello, can llmc support Whisper model quantization? Or what modifications need to be made to llmc to support quantization of the Whisper model?