Results 11 comments of pangr

> In the only example provided in the toolkit, it loaded the PTQ calibrated weights and did the QAT based on it. There isn't a standalone QAT example without PTQ....

> By default, TRT assumes that the network inputs/outputs are in FP32 linear (i.e. NCHW) format. However, many tactics in TRT require different formats, like NHWC8 or NC/32HW32 formats, so...

> https://github.com/NVIDIA/TensorRT/tree/main/tools/onnx-graphsurgeon and https://github.com/NVIDIA/TensorRT/tree/main/tools/onnx-graphsurgeon/examples/04_modifying_a_model Sorry, it doesn't work, Is there any inference in pytorch_quantization, I just doesn't want to quantify “Gemm”

> or create a case and run it with trtexec --verbose, you are able to see the final engine structure in the log. which will tell if TRT can support...

Does TensorRT support leakyrelu quantization?

> Whether the fusion happens depends on whether TRT has tactics supporting that. The very rough guidelines are: > > * Conv+LeakyReLU should be fused in FP16 or in INT8...

When I asked "Who is founder of goolge.com?", the result of llama13B answered as shown in the figure below: “tro tro tro tro tro tro tro tro tro tro tro...

> We have implemented W8A8 inference in vLLM, which can achieve a 30% improvement in throughput. W4A16 quantization methods require weights to be dequantized into fp16 before compute and lead...

> 最好是加点别的说话人, 数据太少训练很容易 model collapse 和灾难性遗忘 我加了数据集aishell3中的训练集一起训练,并且冻住了MRTE的参数,训练了300000steps,效果还是一样,请问是有什么参数配置需要调整吗

> Hi~, 这里的训练是为了让模型能接受论文提出的MSAC的输入,但是我们没有这么多的数据去训练模型,所以学习率设置比较小,让模型学习到MSAC的输入,而不破坏预训练好的能力 明白,感谢回复,还有个问题是mini-monkey的整个预训练中都只训练了文本大模型、没有训练视觉模块和模态对齐的模块吗