Inquiry Regarding Model Quantization and Performance Optimization
Feature request / 功能建议
Quantization Process and Script Availability: In the provided documentation, it is mentioned that the model can be run with INT4 or INT8 inference, which is a significant aspect for deployment on NVIDIA devices. Could you please provide the quantization calibration scripts for the "4bit/8bit" models? Additionally, could you elaborate on the methodology used for quantization and how to ensure stable reproduction of the model's performance?
Motivation / 动机
Performance Improvement for Quantized Versions: During our testing, we have observed that the "4bit/8bit" quantized versions of the model perform slightly worse compared to the original version. Are there any ongoing efforts or suggestions for enhancing the performance of these quantized models? What are the key considerations or best practices we should be aware of to optimize their performance?
Your contribution / 您的贡献
--
We use NVIDIA's bitsandbytes library for simple quantization, and there is a line in the CLI DEMO.
quantization_config=BitsAndBytesConfig(load_in_8bit=True),
However, this method will definitely have some loss, because it is not quantization calibrated by the dataset. We currently do not have the energy to perform calibration and quantization, so we are using this solution. In the future, if there is support from relevant personnel, we will give it a try.
The attached spreadsheet contains the results of my tests on ScreenShotV2 , which include performance metrics for various categories such as mobile_text, mobile_icon, desktop_text, desktop_icon, web_text, and web_icon. The performance is measured in terms of percentage accuracy for different versions of CogAgent, including T1, T2, and two configurations using bitandbytes (int4 and int8). To my surprise, the results indicate that the bitandbytes int8 configuration consistently performs worse than the int4 configuration across all tested categories. This outcome is counterintuitive, as I would have expected the int8 configuration, with its higher bit depth, to provide better performance or at least be on par with the int4 configuration. I am seeking your expertise in understanding why the bitandbytes int8 might be underperforming compared to the int4. Are there any specific reasons or factors that could account for this discrepancy? Could it be related to the way the data is processed, the efficiency of the algorithms, or perhaps some other technical aspect that I might have overlooked? Understanding the reasons behind this performance difference is crucial for me, as it will influence the configuration choices I make for future projects. Your insights would be invaluable in helping me interpret these results correctly. Thank you very much for your time and consideration. I look forward to your response and any explanations you can provide regarding this phenomenon.
The attached spreadsheet contains the results of my tests on ScreenShotV2 , which include performance metrics for various categories such as mobile_text, mobile_icon, desktop_text, desktop_icon, web_text, and web_icon. The performance is measured in terms of percentage accuracy for different versions of CogAgent, including T1, T2, and two configurations using bitandbytes (int4 and int8). To my surprise, the results indicate that the bitandbytes int8 configuration consistently performs worse than the int4 configuration across all tested categories. This outcome is counterintuitive, as I would have expected the int8 configuration, with its higher bit depth, to provide better performance or at least be on par with the int4 configuration. I am seeking your expertise in understanding why the bitandbytes int8 might be underperforming compared to the int4. Are there any specific reasons or factors that could account for this discrepancy? Could it be related to the way the data is processed, the efficiency of the algorithms, or perhaps some other technical aspect that I might have overlooked? Understanding the reasons behind this performance difference is crucial for me, as it will influence the configuration choices I make for future projects. Your insights would be invaluable in helping me interpret these results correctly. Thank you very much for your time and consideration. I look forward to your response and any explanations you can provide regarding this phenomenon.
I am testing the accuracy of the CogAgent model after quantization. The current approach is to determine whether the coordinate center returned by the model is in the annotated bbox. What is your specific approach?
I'm glad to hear that you're testing the accuracy of the CogAgent model after quantization. I'm using a similar approach to determine the accuracy. Specifically, I'm checking if the coordinate center returned by the model falls within the annotated bounding box. This method helps to evaluate the precision of the model's predictions.