Sasank Chilamkurthy

Results 59 comments of Sasank Chilamkurthy

Making a list of benchmark comparisons: - [x] OneMKL Tflops - [x] Pytorch tflops - [x] llama.cpp mistral-7b int8 tok/s - [ ] Big DL mistral-7b int8 tok/s Lemme know...

I have done benchmarks of mistral 7b int4 for M2 Air, Intel 12400 and Arc 770 16GB. I used [llama-bench](https://github.com/ggerganov/llama.cpp/tree/master/examples/llama-bench) and mistral 7b model from [here](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/blob/main/mistral-7b-v0.1.Q4_0.gguf) to find tok/s for...

Vulkan results are interesting! Did you follow the instructions from here? https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#vulkan I will reproduce the results with llama-bench. By the way, I created an issue about performance at https://github.com/ggerganov/llama.cpp/issues/5480....

I observed last run not even finishing for other tests as well. But you're right getting very slow tok/s in llama-bench. Makes you wonder if llama-bench is accurate! Vulkan0: Intel(R)...

Wow this is amazing!

Honestly weird this works. Did you just find a compiler bug in gcc?!

It works with your patch! > Side note: clang++ is going to get deprecated soon in the future oneAPI toolkit releases, for more information you can check the README file...