llama.cpp Nix flake

This pull request adds a simple Nix Flake for building and distributing the binaries of this repository in a combined package.

The main binary can be executed like this (assuming experimental-features = nix-command flakes), for example:

$ nix shell github:niklaskorz/llama.cpp/nix -c llama --help
usage: llama-cpp [options]

options:
  -h, --help            show this help message and exit
  -s SEED, --seed SEED  RNG seed (default: -1)
  -t N, --threads N     number of threads to use during computation (default: 4)
  -p PROMPT, --prompt PROMPT
                        prompt to start generation with (default: random)
  -n N, --n_predict N   number of tokens to predict (default: 128)
  --top_k N             top-k sampling (default: 40)
  --top_p N             top-p sampling (default: 0.9)
  --temp N              temperature (default: 0.8)
  -b N, --batch_size N  batch size for prompt processing (default: 8)
  -m FNAME, --model FNAME
                        model path (default: models/llama-7B/ggml-model.bin)

When merged into the main branch of this repository, nix shell github:ggerganov/llama.cpp would be used instead.

The binary names in the Nix package are:

main=> llama
quantize => quantize
convert-pth-to-ggml.py => convert-pth-to-ggml

Note that convert-pth-to-ggml directly executes in a Python 3.10 environment with the required Python packages.

Mar 12 '23 09:03 niklaskorz

Can you add the .envrc to load the flake right away?

A single line .envrc should do,

use_flake

Mar 13 '23 01:03 Dietr1ch

Do you plan to add support for the C++ part?

BTW, I ended up using this trivial shell.nix because I haven't learnt how to use flakes yet,

{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/nixos-22.11.tar.gz") {} }:

pkgs.mkShell {
  buildInputs = with pkgs; [
    gnumake
    clang

    black
    python
    python310Packages.torch-bin
    python310Packages.numpy
    python310Packages.sentencepiece
  ];
}

Differences so far are, GNU Make, clang, and black.

Mar 13 '23 01:03 Dietr1ch

Do you plan to add support for the C++ part?

BTW, I ended up using this trivial shell.nix because I haven't learnt how to use flakes yet,
{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/nixos-22.11.tar.gz") {} }:

pkgs.mkShell {
  buildInputs = with pkgs; [
    gnumake
    clang

    black
    python
    python310Packages.torch-bin
    python310Packages.numpy
    python310Packages.sentencepiece
  ];
}
Differences so far are, GNU Make, clang, and black.

No need to include these, the C/C++ compiler and make (and even CMake I think) are already included automatically by Nix's stdenv. I am hesitant with including black unless @ggerganov wants the Python script to explicitly be black-formatted in the future.

Edit: CMake is not pre-included in stdenv, but GCC/G++ and GNU make are.

Mar 13 '23 09:03 niklaskorz

@Dietr1ch I have added shell.nix and default.nix for backwards-compatibility with non-flake-nix, as well as the direnv config and a development shell.

Mar 13 '23 09:03 niklaskorz

NixOS expert here: I have tested this locally and it works perfectly.

I don't see flake-compat being used anywhere, why is it imported?

Mar 14 '23 12:03 prusnak

NixOS expert here: I have tested this locally and it works perfectly.

I don't see flake-compat being used anywhere, why is it imported?

flake-compat has to be referenced by the flake to be included in the lockfile, which shell.nix and default.nix then read to load the pinned version of flake-compat

Mar 14 '23 12:03 niklaskorz

Is it possible to move these files in a subfolder? I want root to be clean as possible

I don't think that is possible, Nix expects flake.nix and flake.lock to be at the root of the repository, same goes for direnv with .envrc. I personally am fine with removing default.nix and shell.nix if those are too much, as they are only needed for backwards-compatibility.

Mar 14 '23 17:03 niklaskorz

Yeah, let's drop shell/default.nix. I think we can also drop direnv - it's not necessary for running llama as nix flake.

Mar 14 '23 17:03 prusnak

Yeah, let's drop shell/default.nix. I think we can also drop direnv - it's not necessary for running llama as nix flake.

Done, and I added .envrc to gitignore so it's still possible to have this configured locally for anyone who wants to.

Mar 14 '23 17:03 niklaskorz

Rebased to the latest master commit, now the build fails in Nix:

ggml.c:1364:25: warning: implicit declaration of function 'vdotq_s32' is invalid in C99 [-Wimplicit-function-declaration]
        int32x4_t p_0 = vdotq_s32(vdupq_n_s32(0), v0_0ls, v1_0ls);
                        ^
ggml.c:1364:19: error: initializing 'int32x4_t' (vector of 4 'int32_t' values) with an expression of incompatible type 'int'
        int32x4_t p_0 = vdotq_s32(vdupq_n_s32(0), v0_0ls, v1_0ls);
                  ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1365:19: error: initializing 'int32x4_t' (vector of 4 'int32_t' values) with an expression of incompatible type 'int'
        int32x4_t p_1 = vdotq_s32(vdupq_n_s32(0), v0_1ls, v1_1ls);
                  ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1367:13: error: assigning to 'int32x4_t' (vector of 4 'int32_t' values) from incompatible type 'int'
        p_0 = vdotq_s32(p_0, v0_0hs, v1_0hs);
            ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1368:13: error: assigning to 'int32x4_t' (vector of 4 'int32_t' values) from incompatible type 'int'
        p_1 = vdotq_s32(p_1, v0_1hs, v1_1hs);
            ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning and 4 errors generated.

Interestingly, this does not happen when building in my normal macOS environment without Nix.

Mar 14 '23 18:03 niklaskorz

Interestingly, this does not happen when building in my normal macOS environment without Nix.

My guess is because stdenv in nixpkgs for macos is still stuck at Clang 11.x while recent macOS ships Clang 14.x.

The fix would be to override the stdenv for macos to use Clang 14.x.

Mar 14 '23 18:03 prusnak

Interestingly, this does not happen when building in my normal macOS environment without Nix.

My guess is because stdenv in nixpkgs for macos is still stuck at Clang 11.x while recent macOS ships Clang 14.x.

The fix would be to override the stdenv for macos to use Clang 14.x.

I tried that, it doesn't change anything. The problem appears to be the macOS SDK, not the compiler. Unfortunately, nixpkgs does not include any SDK newer than macOS 12. I suspect that vdotq_s32 has only been added in macOS 13.

Edit: it's defined in /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/14.0.0/include/arm_neon.h, I'll have a look how arm_neon.h differs from the header included in nixpkg's llvmPackages_14.

Edit 2: the function is defined in both Nix's standard Clang 11 <arm_neon.h> as well as nixpkg's llvmPackages_14... so now to find out what is preventing that to be included.

Edit 3: Thanks to this reddit thread, I have found that __ARM_FEATURE_DOTPROD is only defined in Xcode 13 and newer, and apparently not in any Clang version shipped by Nix for macOS. Adding -D__ARM_FEATURE_DOTPROD as compiler flags makes it compile (and link) without any issues. I know that https://github.com/ggerganov/llama.cpp/commit/47857e564c218a2c38346d0cdd94314632878fcb now made this optional which would also fix this error, but I think we could manually add the define for the flake only (possibly by passing it to CMake?).

Mar 14 '23 18:03 niklaskorz

The flake package is now built with CMake instead of Make and passes -DCMAKE_C_FLAGS=-D__ARM_FEATURE_DOTPROD=1=1 only for macOS on ARM (i.e., aarch64-darwin). So everything is working again!

Also, the flake package is now using the same binary names as in CMakeLists (llama and quantize).

Mar 14 '23 20:03 niklaskorz

So I am thinking this kind of deploys (together with docker #132) should probably live in a separate repo(s). What I don't like it that we are cluttering the root source tree. Another problem is we have to sync the build scripts with every change.

The alternative is to create for example llama.nix or llama.docker and they can have llama.cpp as submodule. All necessary build files will be there and they will be synched to the specific submodule version. It could also be one common repo for all these kind of deploy systems: llama.deploy.

For now I think we will merge this to make it easily available during the initial interest, but later we should probably do this. Any comments?

Mar 15 '23 19:03 ggerganov

As a person interested in a flake.nix a good future point to remove it for uncluttering is when it is available on NixOS/nixpkgs so that people can do nix run nixpkgs#llama.

For now, having a flake.nix so that we can do nix run github:ggerganov/llama.cpp is a great step forward.

Mar 17 '23 21:03 ingenieroariel

@prusnak Merge if you approve

Mar 17 '23 21:03 ggerganov

llama.cpp llama.cpp copied to clipboard

Nix flake

llama.cpp
llama.cpp copied to clipboard