bug: Torch.dynamo is not working on H100 due to obsolete triton & pytorch

Open Artyom17 opened this issue 1 year ago • 0 comments


Torch.dynamo is not working on H100 due to obsolete triton & pytorch

Steps to reproduce

Easily reproducible on H100 by running 'pytest -k benchmark'

Expected Behavior


Actual Behavior

Doesn't work. The issue is in old Triton (v2.0.0) which does not know anything about H100 (sm_90). Getting the following errors:

  NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation.
  The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
  If you want to use the NVIDIA H100 PCIe GPU with PyTorch, please check the instructions at

This one could be solved by installing a newer Torch 2.0.1+cu118 from the suggested url.

The second one is a triton issue:

E       RuntimeError: CUDA error: no kernel image is available for execution on the device
E       CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

v2.0.0. has limitiation - it supports only up to < sm_90 (not including). Could not install a newer triton easily, since it complains being incompatible. However, I was able hack Triton: got it locally, synced to v2.0.0. tag and reverted the d54c04ab commit. But I am not sure it is using all SMs correctly on H100 after this surgery.

Your environment

Using Docker:

DOCKER_BUILDKIT=1 docker build -t kernl .
docker run --rm -it --gpus all -v $(pwd):/kernl kernl

Also tried the more recent NVidia Docker image (12.2.0-devel-ubuntu22.04 - same result.


