Manu Maheshwari
Results
3
comments of
Manu Maheshwari
Did this fix work? Did not work for me.
For llama2-7b q4fp16_1 quantization and a context length of 128, these are the context phase time differences - I installed it using pip around a week back. MLC-LLM - 266.3...
The gemm times itself are very huge for the context phase