Jiangtao lv
Jiangtao lv
cudnn version is 9.9.0
transformer_engine version is 2.2.0+c55e425
Hello everyone, I've discovered a new speed issue. When my input shape changes, the speed improvement of te's linear compared to torch's linear is different. Here is code ``` import...
Thank you for your reply. I now understand why the inference speed of FP8 is slower.
I am currently using torch.profiler to measure the time taken by inference on both the CPU and GPU during code execution. The detailed results and code are shown below. From...
I have the same question.
I think the extra time is due to the introduction of quantization and dequantization layers。 @DDDaar