Jiangtao lv

Results 7 comments of Jiangtao lv

Hello everyone, I've discovered a new speed issue. When my input shape changes, the speed improvement of te's linear compared to torch's linear is different. Here is code ``` import...

Thank you for your reply. I now understand why the inference speed of FP8 is slower.

I am currently using torch.profiler to measure the time taken by inference on both the CPU and GPU during code execution. The detailed results and code are shown below. From...

I think the extra time is due to the introduction of quantization and dequantization layers。 @DDDaar