Robert Yang comments

Repositories
Issues
Comments

Results 3 comments of


                                            Robert Yang

Support for FP8 quantization with TensorRT-LLM

Hi Nathan, for FP8 quantization, there are two currently offered choices - SmoothQuant and AWQ. For SmoothQuant for example, to enable FP8 smoothquant, the options you can add are ```...

DJL-TensorRT-LLM Bug: TypeError: Got unsupported ScalarType BFloat16

Hi Riley, thanks for raising the issue. It seems like this is most likely an error with the checkpoint conversion script in NVIDIA/TensorRT-LLM, since it is directly loading the weights...

DJL-TensorRT-LLM Bug: TypeError: Got unsupported ScalarType BFloat16

That's right - we know that TensorRT-LLM switched to a different way of loading the model from 0.7.1 to 0.8.0, so that may have caused the issue. We're also looking...