llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

How to build on windows?

Open Zerogoki00 opened this issue 1 year ago • 18 comments

Please give instructions. There is nothing in README but it says that it supports it

Zerogoki00 avatar Mar 13 '23 20:03 Zerogoki00

At this point, there's support for CMake. The Python segments of the README should basically be the same. Once you install it, you can run

cmake -S . -B build/ -D CMAKE_BUILD_TYPE=Release

cmake --build build/ --config Release

I'm not actually sure if you need CMAKE_BUILD_TYPE=Release for the first command, but it ran for me.

Afterwards, the exe files should be in the build/Release folder, and you can call them in place of ./quantize and ./main

.\build\Release\quantize.exe .\models\7B\ggml-model-f16.bin .\models\7B\ggml-model-q4_0.bin 2

.\build\Release\llama.exe -m .\models\7B\ggml-model-q4_0.bin -t 8 -n 128

The current README points to a shell script for quantizing, but you can refer to an older version of the README for manual instructions.

cgcha avatar Mar 13 '23 21:03 cgcha

I usually run Linux, so I'm pretty unfamiliar with CMake, and there are probably better conventions for how to do this cleanly. I also tried everything in WSL and it seems to work fine.

cgcha avatar Mar 13 '23 21:03 cgcha

Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:

  • Install Strawberry Pearl: https://strawberryperl.com/
  • Install Chocolatey: https://chocolatey.org/
  • Install make distributed with chocolatey: choco install make
set CC=C:\Strawberry\c\bin\gcc.exe
set CXX=C:\Strawberry\c\bin\g++.exe 
make
quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin  2
main.exe -m q4_0.bin -t 8 -n 128 

fgblanch avatar Mar 14 '23 01:03 fgblanch

I will recommend using WSL2 on Windows, that's what I used and everything worked fine. I followed the steps for running the model from here - https://til.simonwillison.net/llms/llama-7b-m2

akshay-verma avatar Mar 14 '23 02:03 akshay-verma

Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:

  • Install Strawberry Pearl: https://strawberryperl.com/
  • Install Chocolatey: https://chocolatey.org/
  • Install make distributed with chocolatey: choco install make
set CC=C:\Strawberry\c\bin\gcc.exe
set CXX=C:\Strawberry\c\bin\g++.exe 
make
quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin  2
main.exe -m q4_0.bin -t 8 -n 128 

Tried these steps, ran into this error. Any ideas?

process_begin: CreateProcess(NULL, uname -s, ...) failed. Makefile:2: pipe: No error process_begin: CreateProcess(NULL, uname -p, ...) failed. Makefile:6: pipe: No error process_begin: CreateProcess(NULL, uname -m, ...) failed. Makefile:10: pipe: No error /usr/bin/bash: cc: command not found I llama.cpp build info: I UNAME_S: I UNAME_P: I UNAME_M: I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC I LDFLAGS: I CC: I CXX: g++.exe (i686-posix-dwarf, Built by strawberryperl.com project) 8.3.0

cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o process_begin: CreateProcess(NULL, cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o, ...) failed. make (e=2): The system cannot find the file specified. make: *** [Makefile:186: ggml.o] Error 2

YongeBai avatar Mar 14 '23 02:03 YongeBai

Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:

  • Install Strawberry Pearl: https://strawberryperl.com/
  • Install Chocolatey: https://chocolatey.org/
  • Install make distributed with chocolatey: choco install make

set CC=C:\Strawberry\c\bin\gcc.exe

set CXX=C:\Strawberry\c\bin\g++.exe

make

quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin 2

main.exe -m q4_0.bin -t 8 -n 128

Tried these steps, ran into this error. Any ideas?

process_begin: CreateProcess(NULL, uname -s, ...) failed.

Makefile:2: pipe: No error

process_begin: CreateProcess(NULL, uname -p, ...) failed.

Makefile:6: pipe: No error

process_begin: CreateProcess(NULL, uname -m, ...) failed.

Makefile:10: pipe: No error

/usr/bin/bash: cc: command not found

I llama.cpp build info:

I UNAME_S:

I UNAME_P:

I UNAME_M:

I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2

I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC

I LDFLAGS:

I CC:

I CXX: g++.exe (i686-posix-dwarf, Built by strawberryperl.com project) 8.3.0

cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o

process_begin: CreateProcess(NULL, cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o, ...) failed.

make (e=2): The system cannot find the file specified.

make: *** [Makefile:186: ggml.o] Error 2

It seems you forgot to set gcc as CC command. Try running:

set CC=C:\Strawberry\c\bin\gcc.exe

fgblanch avatar Mar 14 '23 02:03 fgblanch

main:` prompt: 'The first man on the moon was'
main: number of tokens in prompt = 8
     1 -> ''
  1576 -> 'The'
   937 -> ' first'
   767 -> ' man'
   373 -> ' on'
   278 -> ' the'
 18786 -> ' moon'
   471 -> ' was'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


The first man on the moon was a geologist, and he brought his hammer.
Inside Out is an amazing movie that will take you through all kinds of emotions in its 90 minute run time (and maybe even more during your afterthoughts). The film tells about RileyÔÇÖs journey when she moves from Minnesota to San Francisco for a new job opportunity and how her parents, boyfriend Oliver Tate (!) and friends help her cope with that.
The animation looks great as always in Pixar productions but even more importantly the characters feel believable ÔÇô if you would have asked me before I watched Inside

main: mem per token = 14565444 bytes
main:     load time =  1157.11 ms
main:   sample time =   114.25 ms
main:  predict time = 19469.45 ms / 144.22 ms per token
main:    total time = 21031.82 ms

It works great on Windows using the CMake. Though -t 16 is no faster than -t 8 Ryzen 9 5950x. I regenerated the prompt couple of times on 7B, and about half the time it gets it right.

kaliber91 avatar Mar 14 '23 03:03 kaliber91

The current README points to a shell script for quantizing, but you can refer to an older version of the README for manual instructions.

param([string]$modelPath, [switch]$removeF16) 

Get-ChildItem $modelPath -Filter ggml-model-f16.bin* | 
Foreach-Object {
    $newName = $_.FullName.Replace("f16","q4_0");
    Start-Process -FilePath ".\build\Release\quantize.exe" -ArgumentList $_.FullName, $newName, "2" -Wait 
    if ($removeF16) {
        Remove-Item $_.FullName
    }
}

Call it like this

.\quantize.ps1 -modelPath "C:\PathToModels\65B" or .\quantize.ps1 -modelPath "C:\PathToModels\65B" -removeF16

Just thought I’d share this quickly thrown together powershell script for the Windows version of quantize.sh

Christoph-Wagner avatar Mar 14 '23 10:03 Christoph-Wagner

@kaliber91 7B was terrible for me as well. 13B was a bit better.

akshay-verma avatar Mar 15 '23 07:03 akshay-verma

Solving some common issues people might come across on the latest version of Python when installing the requirements.

This is specifically here as installed Windows versions of Python have compatibility issues with the chosen packages.

python -m pip install numpy
pip3 install torch -f https://download.pytorch.org/whl/torch_stable.html (About a 3GB download)
pip install .\sentencepiece-0.1.97-cp311-cp311-win_amd64.whl

The sentencepiece-0.1.97-cp311-cp311-win_amd64.whl file is from here inside the wheelhouse folder.

Reelix avatar Mar 15 '23 08:03 Reelix

If you're running WSL2, it requires the creation or modification of a .wslconfig file in your user folder.

%USERPROFILE%\.wslconfig:

[wsl2]
memory=12GB
processors=6
swap=4GB

My Setup

  • RAM: 16GB DDR4
  • CPU: Ryzen 7 7500G
  • SSD: 480GB
  • OS: Windows 11

Based on this configuration. I succeeded in making the model conversions. However, when running main it still slows down when reading the model and continuously consumes a lot of memory.

edit ---

kassane avatar Mar 15 '23 17:03 kassane

Another reference:

  • #22

kassane avatar Mar 19 '23 23:03 kassane

I've manually built it using g++ via cmake, make from the msys2 distro

Brawlence avatar Mar 27 '23 06:03 Brawlence

Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:

  • Install Strawberry Pearl: https://strawberryperl.com/
  • Install Chocolatey: https://chocolatey.org/
  • Install make distributed with chocolatey: choco install make
set CC=C:\Strawberry\c\bin\gcc.exe
set CXX=C:\Strawberry\c\bin\g++.exe 
make
quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin  2
main.exe -m q4_0.bin -t 8 -n 128 

Tried these steps, ran into this error. Any ideas?

process_begin: CreateProcess(NULL, uname -s, ...) failed. Makefile:2: pipe: No error process_begin: CreateProcess(NULL, uname -p, ...) failed. Makefile:6: pipe: No error process_begin: CreateProcess(NULL, uname -m, ...) failed. Makefile:10: pipe: No error /usr/bin/bash: cc: command not found I llama.cpp build info: I UNAME_S: I UNAME_P: I UNAME_M: I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC I LDFLAGS: I CC: I CXX: g++.exe (i686-posix-dwarf, Built by strawberryperl.com project) 8.3.0

cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o process_begin: CreateProcess(NULL, cc -I. -O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o, ...) failed. make (e=2): The system cannot find the file specified. make: *** [Makefile:186: ggml.o] Error 2

I got the same error ("The system cannot find the file specified") while trying to start the build with CMake, despite I put the following at the beginning of my CMakeLists.txt file : set( CMAKE_CXX_COMPILER "C:/MinGW/bin/g++.exe" ) set( CMAKE_C_COMPILER "C:/MinGW/bin/gcc.exe" )

Also, when I try g++ --version, i can see that's i'm on 6.3.0 so my WinGW is well installed. Any idea what could go wrong ? :(

Nephistos avatar Mar 27 '23 14:03 Nephistos

There is an very easy way to build on windows using mingw32 compilation in msys2.

  1. Download msys2-x86_64-20230318 from https://www.msys2.org/
  2. Open the file click next, next, wait for install to complete, then press finish
  3. Run C:\msys64\mingw64.exe
  4. Write the commands to install the appropriate files: pacman -S git pacman -S mingw-w64-x86_64-gcc pacman -S make
  5. Clone library for POSIX functions that llama.cpp needs: git clone https://github.com/CoderRC/libmingw32_extended.git cd libmingw32_extended
  6. Build the library: mkdir build cd build ../configure make
  7. Install the library: make install
  8. Change directory: cd ~
  9. Clone llama.cpp: git clone https://github.com/ggerganov/llama.cpp cd llama.cpp
  10. Build llama.cpp: make LDFLAGS='-D_POSIX_MAPPED_FILES -lmingw32_extended' CFLAGS='-D_POSIX_MAPPED_FILES -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -mfma -mf16c -mavx -mavx2' CXXFLAGS='-D_POSIX_MAPPED_FILES -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function'

CoderRC avatar Mar 29 '23 00:03 CoderRC

At this point, there's support for CMake. The Python segments of the README should basically be the same. Once you install it, you can run

cmake -S . -B build/ -D CMAKE_BUILD_TYPE=Release

cmake --build build/ --config Release

I'm not actually sure if you need CMAKE_BUILD_TYPE=Release for the first command, but it ran for me.

Afterwards, the exe files should be in the build/Release folder, and you can call them in place of ./quantize and ./main

.\build\Release\quantize.exe .\models\7B\ggml-model-f16.bin .\models\7B\ggml-model-q4_0.bin 2

.\build\Release\llama.exe -m .\models\7B\ggml-model-q4_0.bin -t 8 -n 128

The current README points to a shell script for quantizing, but you can refer to an older version of the README for manual instructions.

hello, i can't find quantize.exe and llama.exe. only llama.lib in \build\Release
why?

12lxr avatar Apr 04 '23 03:04 12lxr

. . .

hello, i can't find quantize.exe and llama.exe. only llama.lib in \build\Release why?

Hey, all the .exe files will be located in /llama.cpp/build/bin/ after running the cmake commands. You just need to copy and paste them into the /llama.cpp/ directory.

TortueSandwich avatar Apr 08 '23 17:04 TortueSandwich

Probably over engineered, I just got it working on windows by using gcc compiler included in Strawberry Perl and Make distributed with chocolatey:

  • Install Strawberry Pearl: https://strawberryperl.com/
  • Install Chocolatey: https://chocolatey.org/
  • Install make distributed with chocolatey: choco install make
set CC=C:\Strawberry\c\bin\gcc.exe
set CXX=C:\Strawberry\c\bin\g++.exe 
make
quantize.exe .\models\7B\ggml-model-f16.bin q4_0.bin  2
main.exe -m q4_0.bin -t 8 -n 128 

@fgblanch Look forward to your help, thank you! process_begin: CreateProcess(NULL, uname -s, ...) failed. Makefile:2: pipe: No error process_begin: CreateProcess(NULL, uname -p, ...) failed. Makefile:6: pipe: No error process_begin: CreateProcess(NULL, uname -m, ...) failed. Makefile:10: pipe: No error I llama.cpp build info: I UNAME_S: I UNAME_P: I UNAME_M: I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -march=native -mtune=native I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -march=native -mtune=native I LDFLAGS: I CC: gcc.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0 I CXX: g++.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0 …………………………………………………… llama.cpp:246:22: warning: unknown conversion type character 'l' in format [-Wformat=] llama.cpp:246:22: warning: too many arguments for format [-Wformat-extra-args] llama.cpp: In instantiation of 'T checked_mul(T, T) [with T = unsigned int]': llama.cpp:363:72: required from here llama.cpp:246:22: warning: unknown conversion type character 'l' in format [-Wformat=] llama.cpp:246:22: warning: unknown conversion type character 'l' in format [-Wformat=] llama.cpp:246:22: warning: too many arguments for format [-Wformat-extra-args] **make: *** [Makefile:146: llama.o] Error 1**

iMountTai avatar Apr 10 '23 12:04 iMountTai

You saved me hours! Thank you so much.

I expanded on your make command just a little to include OpenCL support:

make LLAMA_CLBLAST=1 LDFLAGS='-D_POSIX_MAPPED_FILES -lmingw32_extended -lclblast -lOpenCL' CFLAGS='-D_POSIX_MAPPED_FILES -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -mfma -mf16c -mavx -mavx2' CXXFLAGS='-D_POSIX_MAPPED_FILES -I. -I./examples -I./common -I/mingw64/include/CL -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function'

Extra packages I needed: mingw-w64-x86_64-clblast, mingw-w64-x86_64-opencl-headers, mingw-w64-x86_64-opencl-icd

ldd on quantize.exe after a successful build:

Admin@nidhogg MINGW64 ~/llama.cpp $ ldd ./quantize.exe ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ff81e190000) KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ff81caa0000) KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ff81b700000) msvcrt.dll => /c/WINDOWS/System32/msvcrt.dll (0x7ff81cd40000) libgcc_s_seh-1.dll => /mingw64/bin/libgcc_s_seh-1.dll (0x7ff80d3b0000) OpenCL.dll => /c/WINDOWS/SYSTEM32/OpenCL.dll (0x7fffec660000) libclblast.dll => /mingw64/bin/libclblast.dll (0x7fff87d00000) combase.dll => /c/WINDOWS/System32/combase.dll (0x7ff81dd40000) libwinpthread-1.dll => /mingw64/bin/libwinpthread-1.dll (0x7ff817580000) ucrtbase.dll => /c/WINDOWS/System32/ucrtbase.dll (0x7ff81bab0000) RPCRT4.dll => /c/WINDOWS/System32/RPCRT4.dll (0x7ff81cb70000) libstdc++-6.dll => /mingw64/bin/libstdc++-6.dll (0x26f583d0000) ADVAPI32.dll => /c/WINDOWS/System32/ADVAPI32.dll (0x7ff81c9f0000) libstdc++-6.dll => /mingw64/bin/libstdc++-6.dll (0x7fffcfea0000) sechost.dll => /c/WINDOWS/System32/sechost.dll (0x7ff81da30000) ole32.dll => /c/WINDOWS/System32/ole32.dll (0x7ff81c770000) msvcp_win.dll => /c/WINDOWS/System32/msvcp_win.dll (0x7ff81b660000) CFGMGR32.dll => /c/WINDOWS/SYSTEM32/CFGMGR32.dll (0x7ff81b230000) GDI32.dll => /c/WINDOWS/System32/GDI32.dll (0x7ff81e120000) win32u.dll => /c/WINDOWS/System32/win32u.dll (0x7ff81b5b0000) gdi32full.dll => /c/WINDOWS/System32/gdi32full.dll (0x7ff81bbd0000) USER32.dll => /c/WINDOWS/System32/USER32.dll (0x7ff81cdf0000)

Exciting times in open source these days!

There is an very easy way to build on windows using mingw32 compilation in msys2.

  1. Download msys2-x86_64-20230318 from https://www.msys2.org/
  2. Open the file click next, next, wait for install to complete, then press finish
  3. Run C:\msys64\mingw64.exe
  4. Write the commands to install the appropriate files: pacman -S git pacman -S mingw-w64-x86_64-gcc pacman -S make
  5. Clone library for POSIX functions that llama.cpp needs: git clone https://github.com/CoderRC/libmingw32_extended.git cd libmingw32_extended
  6. Build the library: mkdir build cd build ../configure make
  7. Install the library: make install
  8. Change directory: cd ~
  9. Clone llama.cpp: git clone https://github.com/ggerganov/llama.cpp cd llama.cpp
  10. Build llama.cpp: make LDFLAGS='-D_POSIX_MAPPED_FILES -lmingw32_extended' CFLAGS='-D_POSIX_MAPPED_FILES -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -mfma -mf16c -mavx -mavx2' CXXFLAGS='-D_POSIX_MAPPED_FILES -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function'

arcadiancomp avatar Aug 28 '23 23:08 arcadiancomp

This works.

git clone --recurse-submodules https://github.com/ggerganov/llama.cpp
export CC=gcc
export CPP=g++
export LDFLAGS='-D_POSIX_MAPPED_FILES -DLLAMA_NATIVE=ON -DLLAMA_BUILD_SERVER=ON -DBUILD_SHARED_LIBS=ON -DLLMODEL_CUDA=OFF -static'
git reset --hard
git clean -fd
git pull
cd llama.cpp
mingw32-make.exe -j 6

0wwafa avatar May 20 '24 21:05 0wwafa

Appreciate it, we've been using llama.cpp for local inference on 20x RTX 3070 Ti's and it is amazing. Can't wait to try it out on Blackwell GPUs soon.

On Mon, May 20, 2024 at 5:25 PM Robert Sinclair @.***> wrote:

This works.

git clone --recurse-submodules https://github.com/ggerganov/llama.cpp export CC=gcc export CPP=g++ export LDFLAGS='-D_POSIX_MAPPED_FILES -DLLAMA_NATIVE=ON -DLLAMA_BUILD_SERVER=ON -DBUILD_SHARED_LIBS=ON -DLLMODEL_CUDA=OFF -static' git reset --hard git clean -fd git pull cd llama.cpp mingw32-make.exe -j 6

— Reply to this email directly, view it on GitHub https://github.com/ggerganov/llama.cpp/issues/103#issuecomment-2121235926, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECIFJWZS5ODQ4X44GU76LTZDJS27AVCNFSM6AAAAAAVZRRMF6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRRGIZTKOJSGY . You are receiving this because you commented.Message ID: <ggerganov/llama. @.***>

arcadiancomp avatar May 20 '24 23:05 arcadiancomp