Robert Yang

Results 3 comments of Robert Yang

Hi Nathan, for FP8 quantization, there are two currently offered choices - SmoothQuant and AWQ. For SmoothQuant for example, to enable FP8 smoothquant, the options you can add are ```...

Hi Riley, thanks for raising the issue. It seems like this is most likely an error with the checkpoint conversion script in NVIDIA/TensorRT-LLM, since it is directly loading the weights...

That's right - we know that TensorRT-LLM switched to a different way of loading the model from 0.7.1 to 0.8.0, so that may have caused the issue. We're also looking...