Zhexin Li
Zhexin Li
hello hustzxd! Your repo is very helpful for my current work. But it confuses me that in the class _Conv2dQ there is an undefined function get_default_kwargs_q(). I can't find it...
## Description Hi, I notice form [Issue](https://github.com/NVIDIA/TensorRT/issues/3243#issuecomment-1714849183) that the int8 MHA_v2 kernel only supports SeqLen > 512. I use pytorch_quantization to insert QDQ in the MHA and convert to TRT....
See README.md
大佬你好,我最近在参考你的 cutlass 写一个 int8 quantized 的 conv。在你的代码中,int8 conv 和 dequantize 是两个 kernel。我想将 DQ fuse 到 conv 中作为一个 kernel 执行以节省访存。我希望借助 DefaultConv2dFprop 中的 EpilogueOp 来实现,也就是将 alpha 设为 input_scale * per_channel_weight_scale,计算公式是 alpha *...