TensorRT-LLM
TensorRT-LLM copied to clipboard
why fp8_e4m3 min_scaling_factor divide 512?
https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/common/cudaFp8Utils.cu#L219 constexpr float min_scaling_factor = 1.0f / (FP8_E4M3_MAX * 512.f); why is it 512?