wejoncy

Results 22 comments of wejoncy

@radikalliberal It's caused by FPU_DENORMAL [onnxruntime-1.13.0-cp37-cp37m-linux_x86_64.whl.zip](https://github.com/microsoft/onnxruntime/files/9624149/onnxruntime-1.13.0-cp37-cp37m-linux_x86_64.whl.zip) I built this whl to disable FPU_denormal detection and it achieve the same performance. How to use it: Once you install that WHL ```...

@yufenglee Do you think if we need to Disable FPU Exception for some Ops?

> Thanks @wejoncy ! What do i have to take into account to build this myself? You can build it from my personal branch if you want https://github.com/microsoft/onnxruntime/tree/jicwen/xnnpack_multithreading_v2 And Enable...

Hi, It might be better to issue this on AutoGPTQ repo. But you can have a try with [QLLM](https://github.com/wejoncy/QLLM), Which is also good to quantize model for vLLM serving. ```...

Hi, If anyone wants try GPTQ quantizationo in vLLM. Please use this repo [QLLM](https://github.com/wejoncy/QLLM) to quantize model(LLama) and it would compatiable AWQ in vLLM. And Of courcr you can select...

> > Hi, If anyone wants try GPTQ quantizationo in vLLM. Please use this repo [QLLM](https://github.com/wejoncy/QLLM) to quantize model(LLama) and it would compatiable AWQ in vLLM. And Of courcr you...

> Thanks for your issue! > > 224 should be the right output shape because floor(input_dimension * scale) = floor(706 * (224/706)) = floor(224) = 224, however, the limited precision...

> I see. Thanks for your explaining. I found that Torch used `double` to represent scale_factor. [c10::optional scale_factors]( https://github.com/pytorch/pytorch/blob/652af5ec15b81c39ec7413519d0ce9938d87bcf1/aten/src/ATen/native/UpSampleBilinear2d.cpp#L165) so they perform good. Would it be possible to align with...

Some of CoreML operators was supported by https://github.com/microsoft/onnxruntime/pull/22710 https://github.com/microsoft/onnxruntime/pull/22480 https://github.com/microsoft/onnxruntime/pull/22068