Yufeng Li

Results 73 comments of Yufeng Li

> I will measure the performance with NeuralSpeed and LLama.cpp. BTW, are you aware of that llama.cpp uses AVX_VNNI for computation which is equal to accuracy_level=COMP_INT8. The target machine doesn't...

[like] Yufeng Li reacted to your message: ________________________________ From: luoyu-intel ***@***.***> Sent: Tuesday, April 9, 2024 6:21:54 AM To: intel/neural-speed ***@***.***> Cc: Comment ***@***.***> Subject: Re: [intel/neural-speed] Performance Gap between...

As it won’t be an issue for bits lower than 8 bits, it should be fine. We mainly use blockwise quantization for bits lower than 8.