Programs released by the release page are slower than those built locally.
On the Windows platform, the prompt eval speed of programs released by publishing pages is about 10% slower than those compiled locally through w64devkit. Both avx and avx2 seem to have this problem.
release page: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | llama_print_timings: prompt eval time = 2668.44 ms / 8 tokens ( 333.55 ms per token)
compiled locally through w64devkit: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | llama_print_timings: prompt eval time = 2451.74 ms / 8 tokens ( 306.47 ms per token)
I don't think this is very strange, right? It compiles with settings optimised for your exact setup, the release builds are optimised for more general setups.
Nothing optimized to certain machine ... just that compiler is better than used here on for binaries.
Is it possible to add a workflow whose compilation environment is w64devkit?
The difference is the compiler, the Release builds are built using Microsoft's MSVC compiler while w64devkit uses GCC.
Is it possible to add a workflow whose compilation environment is w64devkit?
Perhaps, or maybe MSYS2.
there's nothing actionable here. To see why this happens, read up on "mtune" and "march" here: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
And know that local builds are built with -march=native -mtune=native , which risks crashing or running slowly on the wrong CPU model, while the release builds are, to avoid crashing on other people's CPU, not built with -march.
I'd close this issue if I could.
Fwiw the Intel ICC compiler does interesting tricks to run fast on every CPU, it checks what CPUs it runs on at startup, then re-writes instructions in-ram at runtime, basically applying march-type optimizations at runtime. However, I don't know of any other compiler supporting that (GCC/MSVC/clang doesn't, Afaik. Microsoft's .net runtime for C# does something similar tho)
Fwiw the Intel ICC compiler does interesting tricks to run fast on every CPU, it checks what CPUs it runs on at startup, then re-writes instructions in-ram at runtime, basically applying
march-type optimizations at runtime. However, I don't know of any other compiler supporting that (GCC/MSVC/clang doesn't, Afaik. Microsoft's .net runtime for C# does something similar tho)
Is it possible to add a workflow whose compilation environment is Intel ICC compiler ?
Is it possible to add a workflow whose compilation environment is Intel ICC compiler ?
I don't know, but a quick google search suggest the answer is yes, it's possible: https://neelravi.com/post/oneapi-github-workflow/
Make will add -march=native -mtune=native which the CMake version doesn't. Would have to compare GCC CMake version MSVC CMake and see if it's the same.
This issue was closed because it has been inactive for 14 days since being marked as stale.