lunar issues

Repositories
Issues
Comments

Results 4 issues of


                                            lunar

❓ [Question] Why TensorRT model is slower?

## ❓ Question Why TensorRT model is slower? I have tried TensorRT in a MHA (multihead attention) model, but found it is even slower than the jit scripted model. ##...

question

performance

Problem about input embeddings generated by other algo.

Hi, I noticed that in your paper 6.1, as the inefficiency of optimizing likelihood function including both **Z** and **V**, you choose to divide the process into two stages. First,...

Add transpose operator when replace Conv1d with qlinear_cuda_old

Re-pull request of https://github.com/PanQiWei/AutoGPTQ/pull/139 to avoid conflicts.

Something wrong with Markdown

Thanks for your notes first~ Something wrong with the grammar of markdown so the code in this formula cannot display correctly.