Chen Fu

Results 12 comments of Chen Fu

Thanks! But why some of the cusparselt calls are not protected and some are?

> LGTM. Please fix the Lint/Python format issue. Just FYI. It can usually pass the check after running black and isort: https://github.com/microsoft/onnxruntime/blob/0c6037b5abe571fc43a55ef7a9d2f846820fbe5d/docs/Coding_Conventions_and_Standards.md#python-code-style I did run black and isort before each...

Changing so many operators all at once has too big of a performance impact. I suggest only modifying one operator first. And we need to have a set of 1p...

im2col/col2im are almost always accompanied by GEMM. We should partition GEMM and im2col/col2im together. This way, we have one single parallel for in the operator instead of two back to...

> > Changing so many operators all at once has too big of a performance impact. I suggest only modifying one operator first. And we need to have a set...

Yes, there is a huge performance drop when separation of Q-DQ node prevented operator fusion from working. for example: https://github.com/microsoft/onnxruntime/issues/14707 an very simple model saw more than twice slower.

Thanks for the info. This is a surprise. Here we are actually leveraging pytorch cpuinfo. This library is used in both pytorch and tensorflow. Do you guys have knowledge about...

@glefundes and @jcreinhold could you also file this issue to pytorch cpuinfo repo while I prepare a PR to get around this?