TensorRT-LLM When will FP8 be available for Mixtral?

When will FP8 be available for Mixtral?

Open Pernekhan opened this issue 11 months ago • 11 comments

Could you guys share rough timeline on the support of FP8 quantization for Mixtral (MoE) model?

cc: @Tracin

Mar 04 '24 22:03 Pernekhan