llama.cpp issues

Add batch files for automatic installation on Windows

This commit adds two new files, Windows-installer.bat and Windows-model_conversion.bat, both of which serve to make using llama.cpp on Windows easier. Windows-installer.bat installs dependencies, such as Python, and Windows-model_conversion.bat converts the...

glencoe2004

enhancement

build

Python3 script instead of bash

3

```python #!/usr/bin/env python3 import os import sys if not (len(sys.argv) == 2 and sys.argv[1] in ["7B", "13B", "30B", "65B"]): print(f"\nUsage: {sys.argv[0]} 7B|13B|30B|65B [--remove-f16]\n") sys.exit(1) for i in os.listdir(f"models/{sys.argv[1]}"): if i.endswith("ggml-model-f16.bin"):...

Black-Engineer

enhancement

Added install instructions for latest python3

1

Added install instructions for the versions of `torch` and `sentencepiece` missing from the pip repo on the latest python3 - Used to get this working on Python 3.11.0

Reelix

Streaming conversion with no torch

3

Drop torch, do not load whole file into memory, process files in parallel and use separate threads for r/w

diimdeep

enhancement

performance

Add parameter to ignore end of text token

Adds the --ignore-eos switch which prevents generation of the end of text (eos) token. This can be useful to avoid unexpected terminations in interactive mode and to force the model...

slaren

enhancement

Improved quantize script

2

I improved the quantize script by adding error handling and allowing to select many models for quantization at once in the command line. I also converted it to Python for...

SuajCarrot

enhancement

Making weights loading faster

11

Tried to address slow weights loading. 7B is okay, but 13B is really slow (several minutes), hard to experiment/prototype with larger models. Replaced `std::ifstream` with C-style file reading using `fopen`....

oKatanaaa

enhancement

performance

sentencepiece bpe compatible tokenizer

9

I believe this largely fixes the tokenization issues. The example mentioned in https://github.com/ggerganov/llama.cpp/issues/167 as well as my local tests (e.g. "accurately" should tokenize as `[7913, 2486]`) are fixed by it....

eiz

fix coloring of last `n_batch` of prompt, and refactor line input

1

https://github.com/ggerganov/llama.cpp/blob/721311070e31464ac12bef9a4444093eb3eaebf7/main.cpp#L980-L983 This can fail to colorize the last `params.n_batch` part of the prompt correctly because `embd` was just loaded with those tokens and not printed, yet.

bitRAKE

bug

Not having enough memory just causes a segfault or something

7

So. I'm trying to build with CMake on Windows 11 and the thing just stops after it's done loading the model. ![image](https://user-images.githubusercontent.com/4723091/226091364-64a488a7-ebb5-4c24-9dd0-1cb81378008d.png) And apparently, this is a segfault. ![Screenshot_20230318_121935](https://user-images.githubusercontent.com/4723091/226091335-afbf2712-d2b8-4b88-9b44-6b6a43d78565.png) Yay...

LoganDark

bug

duplicate

hardware

model

llama.cpp
llama.cpp copied to clipboard

Metadata

Add batch files for automatic installation on Windows

Python3 script instead of bash

Added install instructions for latest python3

Streaming conversion with no torch

Add parameter to ignore end of text token

Improved quantize script

Making weights loading faster

sentencepiece bpe compatible tokenizer

fix coloring of last `n_batch` of prompt, and refactor line input

Not having enough memory just causes a segfault or something

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard