llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Programs released by the release page are slower than those built locally.

Open rankaiyx opened this issue 2 years ago • 9 comments

On the Windows platform, the prompt eval speed of programs released by publishing pages is about 10% slower than those compiled locally through w64devkit. Both avx and avx2 seem to have this problem.

release page: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | llama_print_timings: prompt eval time = 2668.44 ms / 8 tokens ( 333.55 ms per token)

compiled locally through w64devkit: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | llama_print_timings: prompt eval time = 2451.74 ms / 8 tokens ( 306.47 ms per token)

rankaiyx avatar May 25 '23 07:05 rankaiyx

I don't think this is very strange, right? It compiles with settings optimised for your exact setup, the release builds are optimised for more general setups.

Azeirah avatar May 25 '23 20:05 Azeirah

Nothing optimized to certain machine ... just that compiler is better than used here on for binaries.

mirek190 avatar May 25 '23 21:05 mirek190

Is it possible to add a workflow whose compilation environment is w64devkit?

rankaiyx avatar May 26 '23 00:05 rankaiyx

The difference is the compiler, the Release builds are built using Microsoft's MSVC compiler while w64devkit uses GCC.

Is it possible to add a workflow whose compilation environment is w64devkit?

Perhaps, or maybe MSYS2.

SlyEcho avatar May 26 '23 21:05 SlyEcho

there's nothing actionable here. To see why this happens, read up on "mtune" and "march" here: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html And know that local builds are built with -march=native -mtune=native , which risks crashing or running slowly on the wrong CPU model, while the release builds are, to avoid crashing on other people's CPU, not built with -march.

I'd close this issue if I could.

divinity76 avatar May 27 '23 11:05 divinity76

Fwiw the Intel ICC compiler does interesting tricks to run fast on every CPU, it checks what CPUs it runs on at startup, then re-writes instructions in-ram at runtime, basically applying march-type optimizations at runtime. However, I don't know of any other compiler supporting that (GCC/MSVC/clang doesn't, Afaik. Microsoft's .net runtime for C# does something similar tho)

divinity76 avatar May 27 '23 12:05 divinity76

Fwiw the Intel ICC compiler does interesting tricks to run fast on every CPU, it checks what CPUs it runs on at startup, then re-writes instructions in-ram at runtime, basically applying march-type optimizations at runtime. However, I don't know of any other compiler supporting that (GCC/MSVC/clang doesn't, Afaik. Microsoft's .net runtime for C# does something similar tho)

Is it possible to add a workflow whose compilation environment is Intel ICC compiler ?

rankaiyx avatar May 27 '23 12:05 rankaiyx

Is it possible to add a workflow whose compilation environment is Intel ICC compiler ?

I don't know, but a quick google search suggest the answer is yes, it's possible: https://neelravi.com/post/oneapi-github-workflow/

divinity76 avatar May 27 '23 12:05 divinity76

Make will add -march=native -mtune=native which the CMake version doesn't. Would have to compare GCC CMake version MSVC CMake and see if it's the same.

SlyEcho avatar May 27 '23 14:05 SlyEcho

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Apr 09 '24 01:04 github-actions[bot]