Wanli-Jiang

Results 3 issues of Wanli-Jiang

Features: - Added MMMU accuracy for phi4mm image modality. - Added unittests for phi4mm. - Added doc for phi4mm. ## Summary by CodeRabbit ## Release Notes * **Documentation** * Added...

# Note * commit 1 is the same as https://github.com/NVIDIA/TensorRT-LLM/pull/9261 * I will rebase the PR once https://github.com/NVIDIA/TensorRT-LLM/pull/9261 is merged, so that only commit 2 is the real code change....

# Features * Verified with VANILLA and CUTLASS MoE backend. * Support BF16 / FP8 / NVFP4 models. * Support multi-stream for MoE shared and MoE chunking. ## Summary by...