Faisal Zaghloul

Results 13 comments of Faisal Zaghloul

This is with openmp disabled on the threadpool branch: $ LLAMA_CUDA=1 ./scripts/compare-commits.sh master threadpool -nkvo 0,1 -m models/7B/llama7b.gguf | GPU | Model | NKVO | Test | t/s master |...

can confirm it's slightly worse on stories 260K: | GPU | Model | NKVO | Test | t/s master | t/s threadpool | Speedup | |:--------------------|:---------------------------|:-------|:-------|-------------:|-----------------:|----------:| | RTX 3060 Laptop...

@ggerganov @slaren Do you have any more suggestions/comments/concerns regarding this PR? I would suggest we merge it in and create issues to track BLAS/BLIS improvements and/or moving to C++ synchronization...