llama.cpp
llama.cpp copied to clipboard
How to build on windows?
Please give instructions. There is nothing in README but it says that it supports it
At this point, there's support for CMake. The Python segments of the README should basically be the same. Once you install it, you can run
cmake -S . -B build/ -D CMAKE_BUILD_TYPE=Release
cmake --build build/ --config Release
I'm not actually sure if you need CMAKE_BUILD_TYPE=Release
for the first command, but it ran for me.
Afterwards, the exe files should be in the build/Release folder, and you can call them in place of ./quantize and ./main
.\build\Release\quantize.exe .\models\7B\ggml-model-f16.bin .\models\7B\ggml-model-q4_0.bin 2
.\build\Release\llama.exe -m .\models\7B\ggml-model-q4_0.bin -t 8 -n 128
The current README points to a shell script for quantizing, but you can refer to an older version of the README for manual instructions.
I usually run Linux, so I'm pretty unfamiliar with CMake, and there are probably better conventions for how to do this cleanly. I also tried everything in WSL and it seems to work fine.
Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:
- Install Strawberry Pearl: https://strawberryperl.com/
- Install Chocolatey: https://chocolatey.org/
- Install make distributed with chocolatey: choco install make
set CC=C:\Strawberry\c\bin\gcc.exe
set CXX=C:\Strawberry\c\bin\g++.exe
make
quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin 2
main.exe -m q4_0.bin -t 8 -n 128
I will recommend using WSL2 on Windows, that's what I used and everything worked fine. I followed the steps for running the model from here - https://til.simonwillison.net/llms/llama-7b-m2
Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:
- Install Strawberry Pearl: https://strawberryperl.com/
- Install Chocolatey: https://chocolatey.org/
- Install make distributed with chocolatey: choco install make
set CC=C:\Strawberry\c\bin\gcc.exe set CXX=C:\Strawberry\c\bin\g++.exe make quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin 2 main.exe -m q4_0.bin -t 8 -n 128
Tried these steps, ran into this error. Any ideas?
process_begin: CreateProcess(NULL, uname -s, ...) failed. Makefile:2: pipe: No error process_begin: CreateProcess(NULL, uname -p, ...) failed. Makefile:6: pipe: No error process_begin: CreateProcess(NULL, uname -m, ...) failed. Makefile:10: pipe: No error /usr/bin/bash: cc: command not found I llama.cpp build info: I UNAME_S: I UNAME_P: I UNAME_M: I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC I LDFLAGS: I CC: I CXX: g++.exe (i686-posix-dwarf, Built by strawberryperl.com project) 8.3.0
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o process_begin: CreateProcess(NULL, cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o, ...) failed. make (e=2): The system cannot find the file specified. make: *** [Makefile:186: ggml.o] Error 2
Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:
- Install Strawberry Pearl: https://strawberryperl.com/
- Install Chocolatey: https://chocolatey.org/
- Install make distributed with chocolatey: choco install make
set CC=C:\Strawberry\c\bin\gcc.exe
set CXX=C:\Strawberry\c\bin\g++.exe
make
quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin 2
main.exe -m q4_0.bin -t 8 -n 128
Tried these steps, ran into this error. Any ideas?
process_begin: CreateProcess(NULL, uname -s, ...) failed.
Makefile:2: pipe: No error
process_begin: CreateProcess(NULL, uname -p, ...) failed.
Makefile:6: pipe: No error
process_begin: CreateProcess(NULL, uname -m, ...) failed.
Makefile:10: pipe: No error
/usr/bin/bash: cc: command not found
I llama.cpp build info:
I UNAME_S:
I UNAME_P:
I UNAME_M:
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC
I LDFLAGS:
I CC:
I CXX: g++.exe (i686-posix-dwarf, Built by strawberryperl.com project) 8.3.0
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o
process_begin: CreateProcess(NULL, cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o, ...) failed.
make (e=2): The system cannot find the file specified.
make: *** [Makefile:186: ggml.o] Error 2
It seems you forgot to set gcc as CC command. Try running:
set CC=C:\Strawberry\c\bin\gcc.exe
main:` prompt: 'The first man on the moon was'
main: number of tokens in prompt = 8
1 -> ''
1576 -> 'The'
937 -> ' first'
767 -> ' man'
373 -> ' on'
278 -> ' the'
18786 -> ' moon'
471 -> ' was'
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
The first man on the moon was a geologist, and he brought his hammer.
Inside Out is an amazing movie that will take you through all kinds of emotions in its 90 minute run time (and maybe even more during your afterthoughts). The film tells about RileyÔÇÖs journey when she moves from Minnesota to San Francisco for a new job opportunity and how her parents, boyfriend Oliver Tate (!) and friends help her cope with that.
The animation looks great as always in Pixar productions but even more importantly the characters feel believable ÔÇô if you would have asked me before I watched Inside
main: mem per token = 14565444 bytes
main: load time = 1157.11 ms
main: sample time = 114.25 ms
main: predict time = 19469.45 ms / 144.22 ms per token
main: total time = 21031.82 ms
It works great on Windows using the CMake. Though -t 16 is no faster than -t 8 Ryzen 9 5950x. I regenerated the prompt couple of times on 7B, and about half the time it gets it right.
The current README points to a shell script for quantizing, but you can refer to an older version of the README for manual instructions.
param([string]$modelPath, [switch]$removeF16)
Get-ChildItem $modelPath -Filter ggml-model-f16.bin* |
Foreach-Object {
$newName = $_.FullName.Replace("f16","q4_0");
Start-Process -FilePath ".\build\Release\quantize.exe" -ArgumentList $_.FullName, $newName, "2" -Wait
if ($removeF16) {
Remove-Item $_.FullName
}
}
Call it like this
.\quantize.ps1 -modelPath "C:\PathToModels\65B"
or .\quantize.ps1 -modelPath "C:\PathToModels\65B" -removeF16
Just thought I’d share this quickly thrown together powershell script for the Windows version of quantize.sh
@kaliber91 7B was terrible for me as well. 13B was a bit better.
Solving some common issues people might come across on the latest version of Python when installing the requirements.
This is specifically here as installed Windows versions of Python have compatibility issues with the chosen packages.
python -m pip install numpy
pip3 install torch -f https://download.pytorch.org/whl/torch_stable.html (About a 3GB download)
pip install .\sentencepiece-0.1.97-cp311-cp311-win_amd64.whl
The sentencepiece-0.1.97-cp311-cp311-win_amd64.whl file is from here inside the wheelhouse folder.
If you're running WSL2, it requires the creation or modification of a .wslconfig file in your user folder.
%USERPROFILE%\.wslconfig
:
[wsl2]
memory=12GB
processors=6
swap=4GB
My Setup
- RAM: 16GB DDR4
- CPU: Ryzen 7 7500G
- SSD: 480GB
- OS: Windows 11
Based on this configuration. I succeeded in making the model conversions. However, when running main
it still slows down when reading the model and continuously consumes a lot of memory.
edit ---
Another reference:
- #22
I've manually built it using g++ via cmake, make from the msys2 distro
Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:
- Install Strawberry Pearl: https://strawberryperl.com/
- Install Chocolatey: https://chocolatey.org/
- Install make distributed with chocolatey: choco install make
set CC=C:\Strawberry\c\bin\gcc.exe set CXX=C:\Strawberry\c\bin\g++.exe make quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin 2 main.exe -m q4_0.bin -t 8 -n 128
Tried these steps, ran into this error. Any ideas?
process_begin: CreateProcess(NULL, uname -s, ...) failed. Makefile:2: pipe: No error process_begin: CreateProcess(NULL, uname -p, ...) failed. Makefile:6: pipe: No error process_begin: CreateProcess(NULL, uname -m, ...) failed. Makefile:10: pipe: No error /usr/bin/bash: cc: command not found I llama.cpp build info: I UNAME_S: I UNAME_P: I UNAME_M: I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC I LDFLAGS: I CC: I CXX: g++.exe (i686-posix-dwarf, Built by strawberryperl.com project) 8.3.0
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o process_begin: CreateProcess(NULL, cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o, ...) failed. make (e=2): The system cannot find the file specified. make: *** [Makefile:186: ggml.o] Error 2
I got the same error ("The system cannot find the file specified") while trying to start the build with CMake, despite I put the following at the beginning of my CMakeLists.txt file : set( CMAKE_CXX_COMPILER "C:/MinGW/bin/g++.exe" ) set( CMAKE_C_COMPILER "C:/MinGW/bin/gcc.exe" )
Also, when I try g++ --version, i can see that's i'm on 6.3.0 so my WinGW is well installed. Any idea what could go wrong ? :(
There is an very easy way to build on windows using mingw32 compilation in msys2.
- Download msys2-x86_64-20230318 from https://www.msys2.org/
- Open the file click next, next, wait for install to complete, then press finish
- Run C:\msys64\mingw64.exe
- Write the commands to install the appropriate files: pacman -S git pacman -S mingw-w64-x86_64-gcc pacman -S make
- Clone library for POSIX functions that llama.cpp needs: git clone https://github.com/CoderRC/libmingw32_extended.git cd libmingw32_extended
- Build the library: mkdir build cd build ../configure make
- Install the library: make install
- Change directory: cd ~
- Clone llama.cpp: git clone https://github.com/ggerganov/llama.cpp cd llama.cpp
- Build llama.cpp: make LDFLAGS='-D_POSIX_MAPPED_FILES -lmingw32_extended' CFLAGS='-D_POSIX_MAPPED_FILES -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -mfma -mf16c -mavx -mavx2' CXXFLAGS='-D_POSIX_MAPPED_FILES -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function'
At this point, there's support for CMake. The Python segments of the README should basically be the same. Once you install it, you can run
cmake -S . -B build/ -D CMAKE_BUILD_TYPE=Release cmake --build build/ --config Release
I'm not actually sure if you need
CMAKE_BUILD_TYPE=Release
for the first command, but it ran for me.Afterwards, the exe files should be in the build/Release folder, and you can call them in place of ./quantize and ./main
.\build\Release\quantize.exe .\models\7B\ggml-model-f16.bin .\models\7B\ggml-model-q4_0.bin 2 .\build\Release\llama.exe -m .\models\7B\ggml-model-q4_0.bin -t 8 -n 128
The current README points to a shell script for quantizing, but you can refer to an older version of the README for manual instructions.
hello, i can't find quantize.exe and llama.exe. only llama.lib in \build\Release
why?
. . .
hello, i can't find quantize.exe and llama.exe. only llama.lib in \build\Release why?
Hey, all the .exe files will be located in /llama.cpp/build/bin/ after running the cmake commands. You just need to copy and paste them into the /llama.cpp/ directory.
Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:
- Install Strawberry Pearl: https://strawberryperl.com/
- Install Chocolatey: https://chocolatey.org/
- Install make distributed with chocolatey: choco install make
set CC=C:\Strawberry\c\bin\gcc.exe set CXX=C:\Strawberry\c\bin\g++.exe make quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin 2 main.exe -m q4_0.bin -t 8 -n 128
@fgblanch Look forward to your help, thank you!
process_begin: CreateProcess(NULL, uname -s, ...) failed. Makefile:2: pipe: No error process_begin: CreateProcess(NULL, uname -p, ...) failed. Makefile:6: pipe: No error process_begin: CreateProcess(NULL, uname -m, ...) failed. Makefile:10: pipe: No error I llama.cpp build info: I UNAME_S: I UNAME_P: I UNAME_M: I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -march=native -mtune=native I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -march=native -mtune=native I LDFLAGS: I CC: gcc.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0 I CXX: g++.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0 …………………………………………………… llama.cpp:246:22: warning: unknown conversion type character 'l' in format [-Wformat=] llama.cpp:246:22: warning: too many arguments for format [-Wformat-extra-args] llama.cpp: In instantiation of 'T checked_mul(T, T) [with T = unsigned int]': llama.cpp:363:72: required from here llama.cpp:246:22: warning: unknown conversion type character 'l' in format [-Wformat=] llama.cpp:246:22: warning: unknown conversion type character 'l' in format [-Wformat=] llama.cpp:246:22: warning: too many arguments for format [-Wformat-extra-args] **make: *** [Makefile:146: llama.o] Error 1**
You saved me hours! Thank you so much.
I expanded on your make command just a little to include OpenCL support:
make LLAMA_CLBLAST=1 LDFLAGS='-D_POSIX_MAPPED_FILES -lmingw32_extended -lclblast -lOpenCL' CFLAGS='-D_POSIX_MAPPED_FILES -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -mfma -mf16c -mavx -mavx2' CXXFLAGS='-D_POSIX_MAPPED_FILES -I. -I./examples -I./common -I/mingw64/include/CL -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function'
Extra packages I needed: mingw-w64-x86_64-clblast, mingw-w64-x86_64-opencl-headers, mingw-w64-x86_64-opencl-icd
ldd on quantize.exe after a successful build:
Admin@nidhogg MINGW64 ~/llama.cpp $ ldd ./quantize.exe ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ff81e190000) KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ff81caa0000) KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ff81b700000) msvcrt.dll => /c/WINDOWS/System32/msvcrt.dll (0x7ff81cd40000) libgcc_s_seh-1.dll => /mingw64/bin/libgcc_s_seh-1.dll (0x7ff80d3b0000) OpenCL.dll => /c/WINDOWS/SYSTEM32/OpenCL.dll (0x7fffec660000) libclblast.dll => /mingw64/bin/libclblast.dll (0x7fff87d00000) combase.dll => /c/WINDOWS/System32/combase.dll (0x7ff81dd40000) libwinpthread-1.dll => /mingw64/bin/libwinpthread-1.dll (0x7ff817580000) ucrtbase.dll => /c/WINDOWS/System32/ucrtbase.dll (0x7ff81bab0000) RPCRT4.dll => /c/WINDOWS/System32/RPCRT4.dll (0x7ff81cb70000) libstdc++-6.dll => /mingw64/bin/libstdc++-6.dll (0x26f583d0000) ADVAPI32.dll => /c/WINDOWS/System32/ADVAPI32.dll (0x7ff81c9f0000) libstdc++-6.dll => /mingw64/bin/libstdc++-6.dll (0x7fffcfea0000) sechost.dll => /c/WINDOWS/System32/sechost.dll (0x7ff81da30000) ole32.dll => /c/WINDOWS/System32/ole32.dll (0x7ff81c770000) msvcp_win.dll => /c/WINDOWS/System32/msvcp_win.dll (0x7ff81b660000) CFGMGR32.dll => /c/WINDOWS/SYSTEM32/CFGMGR32.dll (0x7ff81b230000) GDI32.dll => /c/WINDOWS/System32/GDI32.dll (0x7ff81e120000) win32u.dll => /c/WINDOWS/System32/win32u.dll (0x7ff81b5b0000) gdi32full.dll => /c/WINDOWS/System32/gdi32full.dll (0x7ff81bbd0000) USER32.dll => /c/WINDOWS/System32/USER32.dll (0x7ff81cdf0000)
Exciting times in open source these days!
There is an very easy way to build on windows using mingw32 compilation in msys2.
- Download msys2-x86_64-20230318 from https://www.msys2.org/
- Open the file click next, next, wait for install to complete, then press finish
- Run C:\msys64\mingw64.exe
- Write the commands to install the appropriate files: pacman -S git pacman -S mingw-w64-x86_64-gcc pacman -S make
- Clone library for POSIX functions that llama.cpp needs: git clone https://github.com/CoderRC/libmingw32_extended.git cd libmingw32_extended
- Build the library: mkdir build cd build ../configure make
- Install the library: make install
- Change directory: cd ~
- Clone llama.cpp: git clone https://github.com/ggerganov/llama.cpp cd llama.cpp
- Build llama.cpp: make LDFLAGS='-D_POSIX_MAPPED_FILES -lmingw32_extended' CFLAGS='-D_POSIX_MAPPED_FILES -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -mfma -mf16c -mavx -mavx2' CXXFLAGS='-D_POSIX_MAPPED_FILES -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function'
This works.
git clone --recurse-submodules https://github.com/ggerganov/llama.cpp
export CC=gcc
export CPP=g++
export LDFLAGS='-D_POSIX_MAPPED_FILES -DLLAMA_NATIVE=ON -DLLAMA_BUILD_SERVER=ON -DBUILD_SHARED_LIBS=ON -DLLMODEL_CUDA=OFF -static'
git reset --hard
git clean -fd
git pull
cd llama.cpp
mingw32-make.exe -j 6
Appreciate it, we've been using llama.cpp for local inference on 20x RTX 3070 Ti's and it is amazing. Can't wait to try it out on Blackwell GPUs soon.
On Mon, May 20, 2024 at 5:25 PM Robert Sinclair @.***> wrote:
This works.
git clone --recurse-submodules https://github.com/ggerganov/llama.cpp export CC=gcc export CPP=g++ export LDFLAGS='-D_POSIX_MAPPED_FILES -DLLAMA_NATIVE=ON -DLLAMA_BUILD_SERVER=ON -DBUILD_SHARED_LIBS=ON -DLLMODEL_CUDA=OFF -static' git reset --hard git clean -fd git pull cd llama.cpp mingw32-make.exe -j 6
— Reply to this email directly, view it on GitHub https://github.com/ggerganov/llama.cpp/issues/103#issuecomment-2121235926, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECIFJWZS5ODQ4X44GU76LTZDJS27AVCNFSM6AAAAAAVZRRMF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRRGIZTKOJSGY . You are receiving this because you commented.Message ID: <ggerganov/llama. @.***>