llama.cpp
llama.cpp copied to clipboard
Comparaison Windows Build VS Unix Build (through WSL2)
Environment and Context
Hello, Before jumping to the subject, here's the environnement I'm working with:
- Windows 10
- Llama-13b-4bit-(GPTQ quantized) model
- Intel® Core™ i7-10700K [AVX | AVX2 | FMA | SSE3 | F16C]
Expected Behavior
I did some comparaisons between the Windows build and the Unix build (through WSL2 Ubuntu_2204.1.8.0_x64) to see if I can notice some differences between them.
Deterministic Settings (seed =1)
For both of those builds, I added the same exact settings:
-t 14 -n 2024 -c 2024 --temp 0.2 --top_k 40 --top_p 0.6 --repeat_last_n 2048
--repeat_penalty 1.17647058824 --color --n_parts 1 -b 500 --seed 1 -p "$(cat STORY.txt)"
With the contents of STORY.txt as follows:
Here's 5 reasons that proves why video-games are good for your brain:
Test#1: Instruction set architectures
Windows:
system_info: n_threads = 14 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 0 |
NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
WSL2
system_info: n_threads = 14 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 |
NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
The Unix-build recognizes all the architectures of my CPU but the Windows-build is missing the F16C, FMA and SSE3 one.
- We probably haven't implemented all the CPU architectures on the Windows build (and maybe on the Unix build too)
- My CPU has more architectures than those included into the builds [MMX, SSE, SSE2, SSSE3, SSE4, SSE4.1 + SSE4.2, AES, BMI, BMI1 + BMI2, FMA3, EM64T, HT, VT-x, VT-d]
I believe that we can significantly enhance the speed of the results by implementing all of the possible instruction set architectures that would be advantageous for text generation.
Test#2: Reproducibility of the output
Since I used the exact same settings for both the Windows and Unix builds (refer to "Deterministic Settings (seed=1)"), I should expect to obtain the exact same output from both.
Windows (I'll call this output "WindowsText"):
1. Video games improve hand eye coordination and reaction time, which is a skill used in everyday life.
2. They help you to focus on the task at hand by blocking out distractions around you. This helps with concentration when doing other tasks such as reading or writing an essay.
3. It improves problem solving skills because it requires players to think of different ways to solve problems. For example, if there was a puzzle that required you to find a key to unlock a door but the only way to get the key is to kill someone who has it then you would have to decide whether killing them is worth getting the key.
4. It can be very relaxing after a long day of school work so it gives you some down time from all the stressful things going on in your life.
5. It also increases creativity because they require you to come up with new ideas to complete levels. [end of text]
WSL2 (I'll call this output "UnixText")
1. Video games improve hand eye coordination and reaction time, which is a very important skill in sports like basketball or football where you need to react quickly when the ball comes towards you.
2. It improves problem solving skills as well because it requires players to think of different ways to solve problems. For example, if there’s an obstacle blocking your way then you have to find another route around it. This helps with real life situations too!
3. It also increases attention span by keeping kids focused on one task at a time. If they get distracted while playing a game then they won’t be able to complete their goal.
4. It can help develop social skills such as teamwork and communication. Players must work together to accomplish goals. They learn how to communicate effectively through voice chat so that everyone knows what needs to happen next.
5. Lastly, it teaches patience. Sometimes you may not know exactly what to do right away but after some practice you will eventually figure out how to beat the level. [end of text]
It's not the case at all, you will get a different output based on the fact you're using a Windows build or a Unix build.
I believe the Unix build has better outputs than the Windows one for the following reasons:
-
It mentions the importance of hand-eye coordination in sports like basketball or football, which are common activities that many people can relate to.
-
UnixText provides a more comprehensive list of benefits. It discusses the improvement of attention span, the development of social skills, and the teaching of patience, which are all valuable skills that were not mentioned in WindowsText.
-
The structure of UnixText is clearer and more concise, which makes it easier to read and understand.
Test#3: Speed
Windows:
llama_print_timings: load time = 21085.73 ms
llama_print_timings: sample time = 734.50 ms / 194 runs ( 3.79 ms per run)
llama_print_timings: prompt eval time = 5380.24 ms / 20 tokens ( 269.01 ms per token)
llama_print_timings: eval time = 93395.22 ms / 193 runs ( 483.91 ms per run)
llama_print_timings: total time = 121975.58 ms
WSL2:
llama_print_timings: load time = 30968.40 ms
llama_print_timings: sample time = 2342.41 ms / 219 runs ( 10.70 ms per run)
llama_print_timings: prompt eval time = 4668.72 ms / 20 tokens ( 233.44 ms per token)
llama_print_timings: eval time = 96435.62 ms / 218 runs ( 442.37 ms per run)
llama_print_timings: total time = 137830.02 ms
- Load time : Windows is 1.46 times faster than Unix.
- Sample time : Windows is 2.82 times faster than Unix.
- Prompt eval time: Unix is 1.15 times faster than Windows.
- Eval Time (Most important value) : Unix is 1.09 times faster than Windows
Unix tends to be faster than Windows, which may be due to the absence of F16C, FMA, and SSE3 architectures in the Windows build.
Conclusion
- The builds doesn't recognize all possible architectures on your CPU, if we fix that we could probably have significant increase of speed.
- Windows and WSL2 don't produce the same output, and I believe the Unix build gives better result. This is kinda concerning because the model is supposed to behave identically no matter the operating system.
- Unix is a bit faster than Windows, but that comparaison would be more relevant if both of them used the same architectures implementations.
I think the discrepancies between the two operating systems were not only observed by me but also by others, and efforts are underway to address them. Nonetheless, I found it interesting to witness such differences between the two systems.
The generation differences may be explained by the lack of FMA and F16C/CVT16 on MSVC. #375 should solve that.
The generation differences may be explained by the lack of FMA and F16C/CVT16 on MSVC. #375 should solve that.
Wouldn't this suggest that Windows should be the less performant of the two? Because in this case he's seeing that windows performs better without the compiler flag optimizations, as I understand it.
Also, I'm not sure if comparing Windows to WSL is equivalent to Windows to Unix. Might there be inherent performance losses when using WSL without implementing specific adaptions for WSL?
Unrelated to performance, the determinism does seem like an issue.
@MillionthOdin16 Sorry I messed up, I didn't use the same models for both at the begining, I fixed that and arrived into a different conclusion (I modified the post in consequence)
Regardless, WSL2 is running in a VM, a small performance penalty is expected.
@BadisG do you mean that the runs are deterministic? Or that the performance of both are similar enough? Or both?
@MillionthOdin16 What I meant by "deterministic" runs is that I fixed the seed to 1, and that alone is supposed to give you the exact same output regardless of anything else. But that isn't the case there, the Unix output and the Windows output are differents for a fixed seed, and it's not supposed to behave like this.
There are always going to be small differences in the generation between different implementations, in this case it is probably because the lack of FMA on MSVC means that some functions use a different path. You should only expect determinism against multiple runs of the same binary.
For evaluating generation quality, you should use the perplexity instead. It would be an issue if there are significant differences in the perplexity of different implementations.
You get different output text because of the temperature setting. To really get deterministic outputs you have to set the temperature nearly to zero (but not exactly zero because then garbage out) even the seed is the same.
This seems to be a bug or math calc rounding error or whatever. I usually would expect the same output even the temp setting is the same between tests and the environment did not change.
For deterministic testing outputs, I set mine to --temp 0.000001
@PriNova Nah, when you fix the seed (I put seed = 1) you'll get the same output everytime. Try it by yourself, put a fixed seed + high temperature and repeat the run several times, you'll get deterministic result
The build flags to enable FMA, F16C, and SSE3 are not in the MSVC. (At least not within the official documentation) https://learn.microsoft.com/en-us/cpp/build/reference/arch-x86?view=msvc-170 https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170
However, by using Clang, which can be installed from Visual Studio, the result is the same as a Unix build.
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
@nusu-github wouldn't this work?
cmake_minimum_required(VERSION 3.8)
project(YourProjectName LANGUAGES CXX)
# Set C++ standard
set(CMAKE_CXX_STANDARD 17)
# Add the source files
set(SOURCE_FILES main.cpp)
add_executable(YourProjectName ${SOURCE_FILES})
# Enable SSE3, F16C, and FMA for MSVC
if(MSVC)
# Enable SSE3
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:SSE3")
# Enable F16C and FMA by setting AVX or a higher level
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX")
endif()
"In this example, the /arch:SSE3 flag enables SSE3 support, while the /arch:AVX flag enables F16C and FMA support as part of the AVX instruction set. Note that you should choose the appropriate AVX level (AVX, AVX2, or AVX512) depending on your target CPU."
That's a solution proposed by GPT4, not sure if this is possible though
@nusu-github wouldn't this work?
cmake_minimum_required(VERSION 3.8) project(YourProjectName LANGUAGES CXX) # Set C++ standard set(CMAKE_CXX_STANDARD 17) # Add the source files set(SOURCE_FILES main.cpp) add_executable(YourProjectName ${SOURCE_FILES}) # Enable SSE3, F16C, and FMA for MSVC if(MSVC) # Enable SSE3 set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:SSE3") # Enable F16C and FMA by setting AVX or a higher level set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX") endif()
"In this example, the /arch:SSE3 flag enables SSE3 support, while the /arch:AVX flag enables F16C and FMA support as part of the AVX instruction set. Note that you should choose the appropriate AVX level (AVX, AVX2, or AVX512) depending on your target CPU."
That's a solution proposed by GPT4, not sure if this is possible though
cl: Command line warning D9002: ignoring unknown option '/arch:SSE3'
Unfortunately, this undocumented build flag is not available. However, looking at the Pull request in the comment above, it seems that FMA and F16C can be use? (Not sure about SSE3 though)
@nusu-github I'm using this cmd command to compile in MSVC:
Remove-Item -Path ".\Windows-build\*" -Force -Recurse
cmake -S . -B Windows-build/ -D CMAKE_BUILD_TYPE=Release
cmake --build Windows-build/ --config Release
What should I write to use clang instead?
@BadisG
Clang must be installed using Visual Studio Installer. https://learn.microsoft.com/en-us/cpp/build/clang-support-msbuild?view=msvc-170
If you already have Clang installed, you can now build.
-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
command:
cmake -S . -B Windows-build/ -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
cmake --build Windows-build/ --config Release
Also consider the following options if you are looking to speed up the process.
-DLLAMA_LTO=ON -DLLAMA_NATIVE=ON
command:
cmake -S . -B Windows-build/ -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLAMA_LTO=ON -DLLAMA_NATIVE=ON
@nusu-github thanks I appreciate!
But I looked at the CMakeList.txt and I don't think there's clang activated in it, so it will alaways create exe binary through MSVC
D:\Large Language Models\LlamaCPU\Windows-build>cmake -S .. -B . -DCMAKE_BUILD_TYPE=Release -DCMAKE_
C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
-- Building for: Visual Studio 17 2022
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19044.
-- The C compiler identification is MSVC 19.33.31629.0
-- The CXX compiler identification is MSVC 19.33.31629.0
Besides, the exe binaries we can download on this repository are likely made by MSVC because it also only recognize the AVX and AVX2
@BadisG I forgot about the generator! Now Clang will be used.
-G "Ninja"
command:
cmake -G "Ninja" -S . -B Windows-build/ -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
When executed correctly, it will look like this
-- The C compiler identification is Clang 15.0.1 with GNU-like command-line
-- The CXX compiler identification is Clang 15.0.1 with GNU-like command-line
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/Llvm/x64/bin/clang.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/Llvm/x64/bin/clang++.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
@nusu-github thank you so much it worked on my side!
Everyone, here's the steps to build your .exe with clang (better than MSVC at the moment)
- Install Clang To do this, you have to install something called LLVM I downloaded LLVM-16.0.0-win64.exe but it could be LLVM-16.0.0-win32.exe for people that has Windows 32 bit. During the install, Choose "Add LLVM to the system PATH for all users"
After the installation you can verify if you have clang installed on your computer by running clang --version
on your cmd
Normally you'll get something like this
C:\Users\Utilisateur>clang --version
clang version 16.0.0
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\LLVM\bin
- Install chocolate (that's necessary to install Ninja (which is necessary to build with clang 😵 )) To do that, open powershell as an administrator and run this command:
Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
- Install Ninja Open powershell as an administrator and run this command:
choco install ninja
- Build with clang run your cmd on the llama.cpp repository and run those commands
cmake -G "Ninja" -S . -B Windows-build/ -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
cmake --build Windows-build/ --config Release
You'll get your exe here:
.\Windows-build\bin
Would be good to get those clang's exe on the binary releases instead of the MSVC's exe from now on as clang is the superior version to compile in windows.
There are always going to be small differences in the generation between different implementations, in this case it is probably because the lack of FMA on MSVC means that some functions use a different path.
@slaren I did a new run with the clang build (which has all the same architectures implementations as gcc) and I still have different outputs
Windows (clang)
main: seed = 1
system_info: n_threads = 14 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.200000, top_k = 40, top_p = 0.600000, repeat_last_n = 2048, repeat_penalty = 1.176471
generate: n_ctx = 2024, n_batch = 500, n_predict = 2024, n_keep = 0
Here's 5 reasons that proves why video-games are good for your brain:
1. Video games improve hand eye coordination and reaction time, which is a skill used in everyday life.
2. They help you to focus on the task at hand and not be distracted by other things around you. This helps with concentration skills.
3. It improves problem solving abilities because it requires players to think of different ways to solve problems or puzzles.
4. It can also increase memory retention because they require players to remember certain patterns or sequences.
5. Lastly, playing video games can reduce stress levels and anxiety. [end of text]
llama_print_timings: load time = 22039.07 ms
llama_print_timings: sample time = 442.03 ms / 120 runs ( 3.68 ms per run)
llama_print_timings: prompt eval time = 4958.48 ms / 20 tokens ( 247.92 ms per token)
llama_print_timings: eval time = 51383.99 ms / 119 runs ( 431.80 ms per run)
llama_print_timings: total time = 79489.24 ms
Unix (gcc)
main: seed = 1
system_info: n_threads = 14 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.200000, top_k = 40, top_p = 0.600000, repeat_last_n = 2048, repeat_penalty = 1.176471
generate: n_ctx = 2024, n_batch = 500, n_predict = 2024, n_keep = 0
Here's 5 reasons that proves why video-games are good for your brain:
1. Video games improve hand eye coordination and reaction time, which is a very important skill in sports like basketball or football where you need to react quickly when the ball comes towards you.
2. It improves problem solving skills as well because it requires players to think of different ways to solve problems. For example, if there’s an obstacle blocking your way then you have to find another route around it. This helps with real life situations too!
3. It also increases attention span by keeping kids focused on one task at a time. If they get distracted while playing a game then they won’t be able to complete their goal.
4. It can help develop social skills such as teamwork and communication. Players must work together to accomplish goals. They learn how to communicate effectively through voice chat so that everyone knows what needs to happen next.
5. Lastly, it teaches patience. Sometimes you may not know exactly what to do right away but after some practice you will eventually figure out how to beat the level. [end of text]
llama_print_timings: load time = 31046.48 ms
llama_print_timings: sample time = 2328.97 ms / 219 runs ( 10.63 ms per run)
llama_print_timings: prompt eval time = 4677.21 ms / 20 tokens ( 233.86 ms per token)
llama_print_timings: eval time = 96892.54 ms / 218 runs ( 444.46 ms per run)
llama_print_timings: total time = 139248.72 ms
@BadisG A quick search suggests that clang and GCC may produce slightly different results with floating point operations. You can try building with clang on WSL2 to see if that's the case, but unless there is a significant difference in the perplexity I don't think that this is an issue.
@slaren I agree with you, if the perplexity is similar between GCC and Clang I don't see the problem either. I guess I have to test it out to be sure.
@PriNova Nah, when you fix the seed (I put seed = 1) you'll get the same output everytime. Try it by yourself, put a fixed seed + high temperature and repeat the run several times, you'll get deterministic result
You're right. I always took -s 0
because I thought zero is in the range of the non-random interval. Know I read that <= 0 is part of random generation.
@PriNova -s 0 was in the range of non-random interval last week, I guess they changed it now 😅
@slaren I agree with you, if the perplexity is similar between GCC and Clang I don't see the problem either. I guess I have to test it out to be sure.
any update on this? would also like to know if the difference in output is due to the different compilers or something else
This issue was closed because it has been inactive for 14 days since being marked as stale.