Description

This PR adds FP8 & BF16 datatype support. It also implements converter for FP8 quantized ops.

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

[ ] My code follows the style guidelines of this project (You can use the linters)
[ ] I have performed a self-review of my own code
[ ] I have commented my code, particularly in hard-to-understand areas and hacks
[ ] I have made corresponding changes to the documentation
[ ] I have added tests to verify my fix or my feature
[ ] New and existing unit tests pass locally with my changes
[ ] I have added the relevant labels to my PR in so that relevant reviewers are notified

Apr 18 '24 23:04 peri044

@peri044 I remember we've already removed cudnn dependency on the release/2.3, but it still stays here: https://github.com/pytorch/TensorRT/blob/3f6999d6ab2f9b62c63b78e8405cacf216370214/py/torch_tensorrt/init.py#L9

This causes an error when I import torch-trt:

>>> import torch_tensorrt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scratch.zewenl_sw/docker_workspace/TRT10/TensorRT/py/torch_tensorrt/__init__.py", line 7, in <module>
    from torch_tensorrt._version import (  # noqa: F401
ImportError: cannot import name '__cudnn_version__' from 'torch_tensorrt._version' (/home/scratch.zewenl_sw/docker_workspace/TRT10/TensorRT/py/torch_tensorrt/_version.py)

Can you take a look?

May 17 '24 21:05 zewenli98

@peri044 I remember we've already removed cudnn dependency on the release/2.3, but it still stays here:

https://github.com/pytorch/TensorRT/blob/3f6999d6ab2f9b62c63b78e8405cacf216370214/py/torch_tensorrt/init.py#L9

This causes an error when I import torch-trt:
>>> import torch_tensorrt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scratch.zewenl_sw/docker_workspace/TRT10/TensorRT/py/torch_tensorrt/__init__.py", line 7, in <module>
    from torch_tensorrt._version import (  # noqa: F401
ImportError: cannot import name '__cudnn_version__' from 'torch_tensorrt._version' (/home/scratch.zewenl_sw/docker_workspace/TRT10/TensorRT/py/torch_tensorrt/_version.py)
Can you take a look?

fixed it now

May 17 '24 21:05 peri044

Cool thanks! And did you implement the unit test for torch.ops.trt.quantize_fp8.default?

May 17 '24 21:05 zewenli98

@peri044 Thanks for the comments. I have refactored based on your suggestions.

May 21 '24 23:05 zewenli98

TensorRT
TensorRT copied to clipboard

feat: Implement FP8 functionality

Description

Type of change

Checklist:

TensorRT TensorRT copied to clipboard

feat: Implement FP8 functionality

Description

Type of change

Checklist:

TensorRT
TensorRT copied to clipboard