TensorRT
TensorRT copied to clipboard
🐛 [Bug] cannot load quantize_fp8 even though the modelopt[all] installed
Bug Description
cannot load quantize_fp8 even though the modelopt[all] installed
WARNING:torch_tensorrt.dynamo.conversion.aten_ops_converters:Unable to import quantization op. Please install modelopt library (https://github.com/NVIDIA/TensorRT-Model-Optimizer?tab=readme-ov-file#installation) to add support for compiling quantized models
WARNING:py.warnings:/usr/lib64/python3.11/tempfile.py:904: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp91sqmx7h'>
_warnings.warn(warn_message, ResourceWarning)
+ exec python -c 'import modelopt.torch.quantization.extensions as ext; print(ext.cuda_ext); print(ext.cuda_ext_fp8)'
Loading extension modelopt_cuda_ext...
<module 'modelopt_cuda_ext' from '/home/user/.cache/torch_extensions/py311_cu121/modelopt_cuda_ext/modelopt_cuda_ext.so'>
Loading extension modelopt_cuda_ext_fp8...
<module 'modelopt_cuda_ext_fp8' from '/home/user/.cache/torch_extensions/py311_cu121/modelopt_cuda_ext_fp8/modelopt_cuda_ext_fp8.so'>
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-modelopt 0.17.0
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.5.82
nvidia-nvtx-cu12 12.1.105
torch 2.4.1
torch_tensorrt 2.4.0
torchaudio 2.4.1
torchinfo 1.8.0
torchmetrics 1.4.3
torchprofile 0.0.4
torchvision 0.19.1
tensorrt 10.1.0
tensorrt-cu12 10.5.0
tensorrt-cu12-bindings 10.1.0
tensorrt-cu12-libs 10.1.0
To Reproduce
Steps to reproduce the behavior:
- create venv and activate it
- install torch, torchvision, torchaudio, tensorrt, nvidia-modelopt[all] torch_tensorrt
- python -c "import modelopt.torch.quantization.extensions as ext; print(ext.cuda_ext); print(ext.cuda_ext_fp8)"
- python -c "import torch_tensorrt"
Expected behavior
should import without warning
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- Torch-TensorRT Version (e.g. 1.0.0): '10.1.0'
- PyTorch Version (e.g. 1.0): 2.4.1
- CPU Architecture: x86_64
- OS (e.g., Linux): Ubuntu
- How you installed PyTorch (
conda,pip,libtorch, source): pip - Build command you used (if compiling from source):
- Are you using local sources or building from archives:
- Python version: 3.11
- CUDA version: 12.1
- GPU models and configuration: RTX4k
- Any other relevant information:
Additional context
torch_tensorrt 2.4.0 used torch.ops.trt.quantize_fp8 at the time of release. The latest main branch already changed to use torch.ops.tensorrt.quantize_op for nvidia-modelopt 0.17.0. You can install the latest nightly build by:
pip install --pre -U torch torchaudio torchvision torch_tensorrt --index-url https://download.pytorch.org/whl/nightly/cu121 --extra-index-url https://pypi.nvidia.com
@braindevices closing issue since it is addressed by @HolyWu