TensorRT 🐛 [Bug] cannot load quantize_fp8 even though the modelopt[all] installed

Bug Description

cannot load quantize_fp8 even though the modelopt[all] installed

WARNING:torch_tensorrt.dynamo.conversion.aten_ops_converters:Unable to import quantization op. Please install modelopt library (https://github.com/NVIDIA/TensorRT-Model-Optimizer?tab=readme-ov-file#installation) to add support for compiling quantized models
WARNING:py.warnings:/usr/lib64/python3.11/tempfile.py:904: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp91sqmx7h'>
  _warnings.warn(warn_message, ResourceWarning)

+ exec python -c 'import modelopt.torch.quantization.extensions as ext; print(ext.cuda_ext); print(ext.cuda_ext_fp8)'
Loading extension modelopt_cuda_ext...
<module 'modelopt_cuda_ext' from '/home/user/.cache/torch_extensions/py311_cu121/modelopt_cuda_ext/modelopt_cuda_ext.so'>
Loading extension modelopt_cuda_ext_fp8...
<module 'modelopt_cuda_ext_fp8' from '/home/user/.cache/torch_extensions/py311_cu121/modelopt_cuda_ext_fp8/modelopt_cuda_ext_fp8.so'>

nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-modelopt          0.17.0
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.5.82
nvidia-nvtx-cu12         12.1.105

torch                    2.4.1
torch_tensorrt           2.4.0
torchaudio               2.4.1
torchinfo                1.8.0
torchmetrics             1.4.3
torchprofile             0.0.4
torchvision              0.19.1
tensorrt                 10.1.0
tensorrt-cu12            10.5.0
tensorrt-cu12-bindings   10.1.0
tensorrt-cu12-libs       10.1.0

To Reproduce

Steps to reproduce the behavior:

create venv and activate it
install torch, torchvision, torchaudio, tensorrt, nvidia-modelopt[all] torch_tensorrt
python -c "import modelopt.torch.quantization.extensions as ext; print(ext.cuda_ext); print(ext.cuda_ext_fp8)"
python -c "import torch_tensorrt"

Expected behavior

should import without warning

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Torch-TensorRT Version (e.g. 1.0.0): '10.1.0'
PyTorch Version (e.g. 1.0): 2.4.1
CPU Architecture: x86_64
OS (e.g., Linux): Ubuntu
How you installed PyTorch (conda, pip, libtorch, source): pip
Build command you used (if compiling from source):
Are you using local sources or building from archives:
Python version: 3.11
CUDA version: 12.1
GPU models and configuration: RTX4k
Any other relevant information:

Additional context

Oct 13 '24 01:10 braindevices

torch_tensorrt 2.4.0 used torch.ops.trt.quantize_fp8 at the time of release. The latest main branch already changed to use torch.ops.tensorrt.quantize_op for nvidia-modelopt 0.17.0. You can install the latest nightly build by:

pip install --pre -U torch torchaudio torchvision torch_tensorrt --index-url https://download.pytorch.org/whl/nightly/cu121 --extra-index-url https://pypi.nvidia.com

Oct 13 '24 02:10 HolyWu

@braindevices closing issue since it is addressed by @HolyWu

May 20 '25 21:05 lanluo-nvidia

TensorRT TensorRT copied to clipboard

🐛 [Bug] cannot load quantize_fp8 even though the modelopt[all] installed

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

TensorRT
TensorRT copied to clipboard