Yufeng Li comments

Repositories
Issues
Comments

Results 73 comments of


                                            Yufeng Li

Performance Gap between Neural Speed Matmul Operator and Llama.cpp Operator

> I will measure the performance with NeuralSpeed and LLama.cpp. BTW, are you aware of that llama.cpp uses AVX_VNNI for computation which is equal to accuracy_level=COMP_INT8. The target machine doesn't...

Performance Gap between Neural Speed Matmul Operator and Llama.cpp Operator

[like] Yufeng Li reacted to your message: ________________________________ From: luoyu-intel ***@***.***> Sent: Tuesday, April 9, 2024 6:21:54 AM To: intel/neural-speed ***@***.***> Cc: Comment ***@***.***> Subject: Re: [intel/neural-speed] Performance Gap between...

Performance Gap between Neural Speed Matmul Operator and Llama.cpp Operator

As it won’t be an issue for bits lower than 8 bits, it should be fine. We mainly use blockwise quantization for bits lower than 8.