only support hopper GPU?
Very good work, but I have some questions to consult.
When I tried to run the code, I encountered the following error.
[rank0]: Traceback (most recent call last): [rank0]: File "/.conda/envs/torch2.4/lib/python3.10/site-packages/triton/language/core.py", line 35, in wrapper [rank0]: return fn(*args, **kwargs) [rank0]: File "/.conda/envs/torch2.4/lib/python3.10/site-packages/triton/language/core.py", line 993, in to [rank0]: return semantic.cast(self, dtype, _builder, fp_downcast_rounding) [rank0]: File "/.conda/envs/torch2.4/lib/python3.10/site-packages/triton/language/semantic.py", line 759, in cast [rank0]: assert builder.options.allow_fp8e4nv, "fp8e4nv data type is not supported on CUDA arch < 89" [rank0]: AssertionError: fp8e4nv data type is not supported on CUDA arch < 89
So the project is still not out of the fp8 restrictions?
I can only use the a100 GPU now. Will the quantization and inference of the A100 GPU be supported in the future? How many A100-40G GPUs are required if supported?
Which model do you need to quantize? Please provide your configuration file.
The latest code should now be able to run FP8 models on A100 GPUs.