wejoncy
wejoncy
@radikalliberal It's caused by FPU_DENORMAL [onnxruntime-1.13.0-cp37-cp37m-linux_x86_64.whl.zip](https://github.com/microsoft/onnxruntime/files/9624149/onnxruntime-1.13.0-cp37-cp37m-linux_x86_64.whl.zip) I built this whl to disable FPU_denormal detection and it achieve the same performance. How to use it: Once you install that WHL ```...
@yufenglee Do you think if we need to Disable FPU Exception for some Ops?
> Thanks @wejoncy ! What do i have to take into account to build this myself? You can build it from my personal branch if you want https://github.com/microsoft/onnxruntime/tree/jicwen/xnnpack_multithreading_v2 And Enable...
Hi, It might be better to issue this on AutoGPTQ repo. But you can have a try with [QLLM](https://github.com/wejoncy/QLLM), Which is also good to quantize model for vLLM serving. ```...
Hi, If anyone wants try GPTQ quantizationo in vLLM. Please use this repo [QLLM](https://github.com/wejoncy/QLLM) to quantize model(LLama) and it would compatiable AWQ in vLLM. And Of courcr you can select...
> > Hi, If anyone wants try GPTQ quantizationo in vLLM. Please use this repo [QLLM](https://github.com/wejoncy/QLLM) to quantize model(LLama) and it would compatiable AWQ in vLLM. And Of courcr you...
Hi @daquexian Would like to have your comments.
> Thanks for your issue! > > 224 should be the right output shape because floor(input_dimension * scale) = floor(706 * (224/706)) = floor(224) = 224, however, the limited precision...
> I see. Thanks for your explaining. I found that Torch used `double` to represent scale_factor. [c10::optional scale_factors]( https://github.com/pytorch/pytorch/blob/652af5ec15b81c39ec7413519d0ce9938d87bcf1/aten/src/ATen/native/UpSampleBilinear2d.cpp#L165) so they perform good. Would it be possible to align with...
Some of CoreML operators was supported by https://github.com/microsoft/onnxruntime/pull/22710 https://github.com/microsoft/onnxruntime/pull/22480 https://github.com/microsoft/onnxruntime/pull/22068