JamshedQurbonboev comments

Repositories
Issues
Comments

Results 2 comments of


                                            JamshedQurbonboev

Server: enable lookup decoding

How much does this PR increase token generation? As far I am aware #5479 had rather tiny speedup. And when do you think this PR will be ready to be...

Server: enable lookup decoding

Thanks for improving performance of llama.cpp. It seems that you were correct: lookup decoding improves speed, but adds constant overhead. So larger models have greater benefit from it. How does...