AwesomeCodingBoy
AwesomeCodingBoy
当网络量化情况不好,并且推理框架允许的情况下,我们可以送一些算子回到fp32,这是量化部署的正确方法。 但是在最新的提交中,似乎代码中衡量量化敏感性的方法是直接在验证集上计算map,我想在目前的yolo6中,有将近100个层需要分析量化敏感性,这意味着整个验证过程需要持续近100次。我想这并不是一时半会能算完的。 一些更加高效的做法是,应当衡量的是网络中最后几个特征图的信号误差情况,而非网络最后的map;并且应当在calibration set上完成这一过程,不应当使用整个测试集。
Hi, developer team of onnx2torch. I am currently developing an neural network quantization framework: https://github.com/openppl-public/ppq/tree/master/ppq. The really interesting part is that we both need to run an onnx model with...
Hi, developers of brevitas. I once worked with vitisAI team in Xilinx for serval months (2020~2021, internship), and now I still work for creating better network quantization tools in Sensetime....
* 添加了许多新的使用说明文件 * 移除了 PPLCUDA_INT4_Quantizer, PPLCUDAMixPrecisionQuantizer以及相关的内容 * 在tqc中添加了属性 require_export 用于后续控制导出逻辑 * 修复了一个子图切分的bug,当block直接连接图的input时可能造成切分错误 * 调整一些程序逻辑使得新的样例可以运行
该更新将核心升级至0.6.6,将修改图调度与异构执行策略,并添加了动态量化与FP8量化的能力,我们目前使用E4M3进行FP8的量化,下表展示了FP8的模型量化精度。 | Inceptionv3 | mnasnet 0.5 | mnasnet 1.0 | squeezenet | shufflenet | resnet18 | mobilenetv2 | mobilenetv3 | efficientnet-b0 | efficientnet-b1 -- | -- | -- | --...
Thank you for your great work, for both cutlass and cute. I'm following instructions to build my program. I use make_tensor to build rav as a pointer to specific register...
快把我们也放上去
https://github.com/openppl-public/ppq
Hi, developer team of onnx2pytorch. I am currently developing an neural network quantization framework: https://github.com/openppl-public/ppq/tree/master/ppq. The really interesting part is that we both need to run an onnx model with...
I encountered some problems when using predicate tensor. In the tutorials: https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/tiled_copy.cu https://github.com/NVIDIA/cutlass/blob/main/media/docs/cute/0y_predication.md There are examples of how to use tiled copy and predication tensor, but I encountered several issues...