TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

🐛 [Bug] cannot load quantize_fp8 even though the modelopt[all] installed

Open braindevices opened this issue 1 year ago • 1 comments

Bug Description

cannot load quantize_fp8 even though the modelopt[all] installed

WARNING:torch_tensorrt.dynamo.conversion.aten_ops_converters:Unable to import quantization op. Please install modelopt library (https://github.com/NVIDIA/TensorRT-Model-Optimizer?tab=readme-ov-file#installation) to add support for compiling quantized models
WARNING:py.warnings:/usr/lib64/python3.11/tempfile.py:904: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp91sqmx7h'>
  _warnings.warn(warn_message, ResourceWarning)
+ exec python -c 'import modelopt.torch.quantization.extensions as ext; print(ext.cuda_ext); print(ext.cuda_ext_fp8)'
Loading extension modelopt_cuda_ext...
<module 'modelopt_cuda_ext' from '/home/user/.cache/torch_extensions/py311_cu121/modelopt_cuda_ext/modelopt_cuda_ext.so'>
Loading extension modelopt_cuda_ext_fp8...
<module 'modelopt_cuda_ext_fp8' from '/home/user/.cache/torch_extensions/py311_cu121/modelopt_cuda_ext_fp8/modelopt_cuda_ext_fp8.so'>

nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-modelopt          0.17.0
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.5.82
nvidia-nvtx-cu12         12.1.105

torch                    2.4.1
torch_tensorrt           2.4.0
torchaudio               2.4.1
torchinfo                1.8.0
torchmetrics             1.4.3
torchprofile             0.0.4
torchvision              0.19.1
tensorrt                 10.1.0
tensorrt-cu12            10.5.0
tensorrt-cu12-bindings   10.1.0
tensorrt-cu12-libs       10.1.0

To Reproduce

Steps to reproduce the behavior:

  1. create venv and activate it
  2. install torch, torchvision, torchaudio, tensorrt, nvidia-modelopt[all] torch_tensorrt
  3. python -c "import modelopt.torch.quantization.extensions as ext; print(ext.cuda_ext); print(ext.cuda_ext_fp8)"
  4. python -c "import torch_tensorrt"

Expected behavior

should import without warning

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): '10.1.0'
  • PyTorch Version (e.g. 1.0): 2.4.1
  • CPU Architecture: x86_64
  • OS (e.g., Linux): Ubuntu
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version: 3.11
  • CUDA version: 12.1
  • GPU models and configuration: RTX4k
  • Any other relevant information:

Additional context

braindevices avatar Oct 13 '24 01:10 braindevices

torch_tensorrt 2.4.0 used torch.ops.trt.quantize_fp8 at the time of release. The latest main branch already changed to use torch.ops.tensorrt.quantize_op for nvidia-modelopt 0.17.0. You can install the latest nightly build by:

pip install --pre -U torch torchaudio torchvision torch_tensorrt --index-url https://download.pytorch.org/whl/nightly/cu121 --extra-index-url https://pypi.nvidia.com

HolyWu avatar Oct 13 '24 02:10 HolyWu

@braindevices closing issue since it is addressed by @HolyWu

lanluo-nvidia avatar May 20 '25 21:05 lanluo-nvidia