Yufeng Li
Yufeng Li
And for the issue in the original post, do you run on Windows or Linux?
@VishalX, just FYI. It turns out something wrong with mmap on windows. If I turns off mmap, Asymmetric works on Windows. You can try it out with this branch if...
> @yufenglee, I tried Asymmetric BlockWise, RTN & GPTQ, with the above fix. Responses for all these include German sentences/words. Do you think this is due to quantization loss only?...
@SimonRelu, could you please profile the model and see if all nodes are running on GPU?
Nice! Do you have a rough estimation when it will be done?
And what will AoT compilation generate, a C/C++ API plus source/.so?
> Being able to convert a HF model for 4-bit quantization would be awesome!! The QLLM tool can convert a 4-bit HF model to ONNX: https://github.com/wejoncy/QLLM. And a tool from...
You need to include both #10199 and #10334 .
> Thanks for your report! What's the accuracy level of this model's MatMulNBits? we use the fp32
This is the tool to get the benchmark number: https://github.com/microsoft/onnxruntime-genai/tree/main/benchmark/python