Torch Profiler Shows Zero Tensor Core Utilization for torch.nn.Conv3d, While Nsight Compute Confirms Usage

Open BurkeHulk opened this issue 11 months ago • 1 comments

Description

I profiled torch.nn.Conv3d using both PyTorch's built-in profiler and Nsight Compute. When viewing the results in TensorBoard, the PyTorch profiler reports zero Tensor Core utilization. However, Nsight Compute indicates that Tensor Cores are actually being used.

Upon investigating the codebase, I found that the Tensor Core allowlist (TC_Allowlist) in [tb_plugin/torch_tb_profiler/profiler/tensor_core.py](https://github.com/pytorch/kineto/blob/main/tb_plugin/torch_tb_profiler/profiler/tensor_core.py) appears to be outdated.

The kernel used in Conv3d is:

sm90_xmma_fprop_implicit_gemm_bf16bf16_bf16f32_f32_nhwckrsc_nhwc_tilesize128x128x64_warpgroupsize1x1x1_g1_execute_segment_k_off_kernel__5x_cudnn

However, xmma_fprop_implicit_gemm is not included in the allowlist, which might explain the discrepancy.

Expected Behavior

PyTorch's profiler using tensorboard should correctly report Tensor Core utilization when kernels that use Tensor Cores are executed.

Suggested Fix

The allowlist should be updated to include xmma_fprop_implicit_gemm and other relevant kernels.

Environment

PyTorch Version: 2.6.0+cu124
CUDA Version: 12.4
GPU: NVIDIA H200
Profiling Tools: PyTorch Profiler, Nsight Compute (2024.1.1.0 (build 33998838))
torch-tb-profiler: 0.4.3

Feb 14 '25 06:02 BurkeHulk

same question， have you solved it ?

Dec 10 '25 07:12 wfloveiu