llama.cpp
llama.cpp copied to clipboard
Nix flake
This pull request adds a simple Nix Flake for building and distributing the binaries of this repository in a combined package.
The main
binary can be executed like this (assuming experimental-features = nix-command flakes
), for example:
$ nix shell github:niklaskorz/llama.cpp/nix -c llama --help
usage: llama-cpp [options]
options:
-h, --help show this help message and exit
-s SEED, --seed SEED RNG seed (default: -1)
-t N, --threads N number of threads to use during computation (default: 4)
-p PROMPT, --prompt PROMPT
prompt to start generation with (default: random)
-n N, --n_predict N number of tokens to predict (default: 128)
--top_k N top-k sampling (default: 40)
--top_p N top-p sampling (default: 0.9)
--temp N temperature (default: 0.8)
-b N, --batch_size N batch size for prompt processing (default: 8)
-m FNAME, --model FNAME
model path (default: models/llama-7B/ggml-model.bin)
When merged into the main branch of this repository, nix shell github:ggerganov/llama.cpp
would be used instead.
The binary names in the Nix package are:
-
main
=>llama
-
quantize
=>quantize
-
convert-pth-to-ggml.py
=>convert-pth-to-ggml
Note that convert-pth-to-ggml
directly executes in a Python 3.10 environment with the required Python packages.
Can you add the .envrc
to load the flake right away?
A single line .envrc
should do,
use_flake
Do you plan to add support for the C++ part?
BTW, I ended up using this trivial shell.nix
because I haven't learnt how to use flakes yet,
{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/nixos-22.11.tar.gz") {} }:
pkgs.mkShell {
buildInputs = with pkgs; [
gnumake
clang
black
python
python310Packages.torch-bin
python310Packages.numpy
python310Packages.sentencepiece
];
}
Differences so far are, GNU Make, clang, and black
.
Do you plan to add support for the C++ part?
BTW, I ended up using this trivial
shell.nix
because I haven't learnt how to use flakes yet,{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/nixos-22.11.tar.gz") {} }: pkgs.mkShell { buildInputs = with pkgs; [ gnumake clang black python python310Packages.torch-bin python310Packages.numpy python310Packages.sentencepiece ]; }
Differences so far are, GNU Make, clang, and
black
.
No need to include these, the C/C++ compiler and make (and even CMake I think) are already included automatically by Nix's stdenv
. I am hesitant with including black
unless @ggerganov wants the Python script to explicitly be black
-formatted in the future.
Edit: CMake is not pre-included in stdenv
, but GCC/G++ and GNU make are.
@Dietr1ch I have added shell.nix
and default.nix
for backwards-compatibility with non-flake-nix, as well as the direnv config and a development shell.
NixOS expert here: I have tested this locally and it works perfectly.
I don't see flake-compat being used anywhere, why is it imported?
NixOS expert here: I have tested this locally and it works perfectly.
I don't see flake-compat being used anywhere, why is it imported?
flake-compat has to be referenced by the flake to be included in the lockfile, which shell.nix
and default.nix
then read to load the pinned version of flake-compat
Is it possible to move these files in a subfolder? I want root to be clean as possible
I don't think that is possible, Nix expects flake.nix
and flake.lock
to be at the root of the repository, same goes for direnv with .envrc
. I personally am fine with removing default.nix
and shell.nix
if those are too much, as they are only needed for backwards-compatibility.
Yeah, let's drop shell/default.nix. I think we can also drop direnv - it's not necessary for running llama as nix flake.
Yeah, let's drop shell/default.nix. I think we can also drop direnv - it's not necessary for running llama as nix flake.
Done, and I added .envrc
to gitignore so it's still possible to have this configured locally for anyone who wants to.
Rebased to the latest master commit, now the build fails in Nix:
ggml.c:1364:25: warning: implicit declaration of function 'vdotq_s32' is invalid in C99 [-Wimplicit-function-declaration]
int32x4_t p_0 = vdotq_s32(vdupq_n_s32(0), v0_0ls, v1_0ls);
^
ggml.c:1364:19: error: initializing 'int32x4_t' (vector of 4 'int32_t' values) with an expression of incompatible type 'int'
int32x4_t p_0 = vdotq_s32(vdupq_n_s32(0), v0_0ls, v1_0ls);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1365:19: error: initializing 'int32x4_t' (vector of 4 'int32_t' values) with an expression of incompatible type 'int'
int32x4_t p_1 = vdotq_s32(vdupq_n_s32(0), v0_1ls, v1_1ls);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1367:13: error: assigning to 'int32x4_t' (vector of 4 'int32_t' values) from incompatible type 'int'
p_0 = vdotq_s32(p_0, v0_0hs, v1_0hs);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1368:13: error: assigning to 'int32x4_t' (vector of 4 'int32_t' values) from incompatible type 'int'
p_1 = vdotq_s32(p_1, v0_1hs, v1_1hs);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning and 4 errors generated.
Interestingly, this does not happen when building in my normal macOS environment without Nix.
Interestingly, this does not happen when building in my normal macOS environment without Nix.
My guess is because stdenv
in nixpkgs for macos is still stuck at Clang 11.x while recent macOS ships Clang 14.x.
The fix would be to override the stdenv for macos to use Clang 14.x.
Interestingly, this does not happen when building in my normal macOS environment without Nix.
My guess is because
stdenv
in nixpkgs for macos is still stuck at Clang 11.x while recent macOS ships Clang 14.x.The fix would be to override the stdenv for macos to use Clang 14.x.
I tried that, it doesn't change anything. The problem appears to be the macOS SDK, not the compiler. Unfortunately, nixpkgs does not include any SDK newer than macOS 12. I suspect that vdotq_s32
has only been added in macOS 13.
Edit: it's defined in /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/14.0.0/include/arm_neon.h
, I'll have a look how arm_neon.h
differs from the header included in nixpkg's llvmPackages_14
.
Edit 2: the function is defined in both Nix's standard Clang 11 <arm_neon.h>
as well as nixpkg's llvmPackages_14
... so now to find out what is preventing that to be included.
Edit 3: Thanks to this reddit thread, I have found that __ARM_FEATURE_DOTPROD
is only defined in Xcode 13 and newer, and apparently not in any Clang version shipped by Nix for macOS. Adding -D__ARM_FEATURE_DOTPROD
as compiler flags makes it compile (and link) without any issues. I know that https://github.com/ggerganov/llama.cpp/commit/47857e564c218a2c38346d0cdd94314632878fcb now made this optional which would also fix this error, but I think we could manually add the define for the flake only (possibly by passing it to CMake?).
The flake package is now built with CMake instead of Make and passes -DCMAKE_C_FLAGS=-D__ARM_FEATURE_DOTPROD=1=1
only for macOS on ARM (i.e., aarch64-darwin
). So everything is working again!
Also, the flake package is now using the same binary names as in CMakeLists (llama
and quantize
).
So I am thinking this kind of deploys (together with docker #132) should probably live in a separate repo(s). What I don't like it that we are cluttering the root source tree. Another problem is we have to sync the build scripts with every change.
The alternative is to create for example llama.nix
or llama.docker
and they can have llama.cpp
as submodule. All necessary build files will be there and they will be synched to the specific submodule version. It could also be one common repo for all these kind of deploy systems: llama.deploy
.
For now I think we will merge this to make it easily available during the initial interest, but later we should probably do this. Any comments?
As a person interested in a flake.nix
a good future point to remove it for uncluttering is when it is available on NixOS/nixpkgs so that people can do nix run nixpkgs#llama
.
For now, having a flake.nix so that we can do nix run github:ggerganov/llama.cpp
is a great step forward.
@prusnak Merge if you approve