lunar

Results 4 issues of lunar

## ❓ Question Why TensorRT model is slower? I have tried TensorRT in a MHA (multihead attention) model, but found it is even slower than the jit scripted model. ##...

question
performance

Hi, I noticed that in your paper 6.1, as the inefficiency of optimizing likelihood function including both **Z** and **V**, you choose to divide the process into two stages. First,...

Re-pull request of https://github.com/PanQiWei/AutoGPTQ/pull/139 to avoid conflicts.

Thanks for your notes first~ Something wrong with the grammar of markdown so the code in this formula cannot display correctly.