Chen Fu comments

Results 12 comments of


                                            Chen Fu

warning: 'cpuinfo_arm_fixup_raspberry_pi_chipset' reading 64 bytes from a region of size 9

@a76yyyy

Why is cublasMMWrapper operations protected by mutex

Thanks! But why some of the cusparselt calls are not protected and some are?

QDQ debugger - activations compare

> LGTM. Please fix the Lint/Python format issue. Just FYI. It can usually pass the check after running black and isort: https://github.com/microsoft/onnxruntime/blob/0c6037b5abe571fc43a55ef7a9d2f846820fbe5d/docs/Coding_Conventions_and_Standards.md#python-code-style I did run black and isort before each...

Partition convolution utilities for parallel processing

Changing so many operators all at once has too big of a performance impact. I suggest only modifying one operator first. And we need to have a set of 1p...

Partition convolution utilities for parallel processing

im2col/col2im are almost always accompanied by GEMM. We should partition GEMM and im2col/col2im together. This way, we have one single parallel for in the operator instead of two back to...

Partition convolution utilities for parallel processing

> > Changing so many operators all at once has too big of a performance impact. I suggest only modifying one operator first. And we need to have a set...

Quantized model extra node emitted between Q-DQ pair

Yes, there is a huge performance drop when separation of Q-DQ node prevented operator fusion from working. for example: https://github.com/microsoft/onnxruntime/issues/14707 an very simple model saw more than twice slower.

Quantized model extra node emitted between Q-DQ pair

Hi folks, any update? @hoangtv2000

Importing onnxruntime on AWS Lambdas with ARM64 processor causes crash

Thanks for the info. This is a surprise. Here we are actually leveraging pytorch cpuinfo. This library is used in both pytorch and tensorflow. Do you guys have knowledge about...

Importing onnxruntime on AWS Lambdas with ARM64 processor causes crash

@glefundes and @jcreinhold could you also file this issue to pytorch cpuinfo repo while I prepare a PR to get around this?