Jiangtao lv comments

Results 7 comments of


                                            Jiangtao lv

Performance Issue with NVIDIA Transformer Engine FP8 Linear Functions on L20

cudnn version is 9.9.0

Performance Issue with NVIDIA Transformer Engine FP8 Linear Functions on L20

transformer_engine version is 2.2.0+c55e425

Performance Issue with NVIDIA Transformer Engine FP8 Linear Functions on L20

Hello everyone, I've discovered a new speed issue. When my input shape changes, the speed improvement of te's linear compared to torch's linear is different. Here is code ``` import...

Performance Issue with NVIDIA Transformer Engine FP8 Linear Functions on L20

Thank you for your reply. I now understand why the inference speed of FP8 is slower.

Performance Issue with NVIDIA Transformer Engine FP8 Linear Functions on L20

I am currently using torch.profiler to measure the time taken by inference on both the CPU and GPU during code execution. The detailed results and code are shown below. From...

using QDQonnx export engine file,fp16and int8，speed is not faster than using ONNX withouout QDQ to export fp16 engine

I have the same question.

using QDQonnx export engine file,fp16and int8，speed is not faster than using ONNX withouout QDQ to export fp16 engine

I think the extra time is due to the introduction of quantization and dequantization layers。 @DDDaar