Yufeng Li

Results 86 comments of Yufeng Li

And for the issue in the original post, do you run on Windows or Linux?

@VishalX, just FYI. It turns out something wrong with mmap on windows. If I turns off mmap, Asymmetric works on Windows. You can try it out with this branch if...

> @yufenglee, I tried Asymmetric BlockWise, RTN & GPTQ, with the above fix. Responses for all these include German sentences/words. Do you think this is due to quantization loss only?...

@SimonRelu, could you please profile the model and see if all nodes are running on GPU?

Nice! Do you have a rough estimation when it will be done?

And what will AoT compilation generate, a C/C++ API plus source/.so?

> Being able to convert a HF model for 4-bit quantization would be awesome!! The QLLM tool can convert a 4-bit HF model to ONNX: https://github.com/wejoncy/QLLM. And a tool from...

> Thanks for your report! What's the accuracy level of this model's MatMulNBits? we use the fp32

This is the tool to get the benchmark number: https://github.com/microsoft/onnxruntime-genai/tree/main/benchmark/python