llama.cpp
llama.cpp copied to clipboard
llama.exe doesn't handle relative file paths in Windows correctly
Please include the ggml-model-q4_0.bin
model to actually run the code:
% make -j && ./main -m ./models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
I llama.cpp build info:
I UNAME_S: Darwin
I UNAME_P: arm
I UNAME_M: arm64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS: -framework Accelerate
I CC: Apple clang version 14.0.0 (clang-1400.0.29.202)
I CXX: Apple clang version 14.0.0 (clang-1400.0.29.202)
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -c utils.cpp -o utils.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread main.cpp ggml.o utils.o -o main -framework Accelerate
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread quantize.cpp ggml.o utils.o -o quantize -framework Accelerate
./main -h
usage: ./main [options]
options:
-h, --help show this help message and exit
-s SEED, --seed SEED RNG seed (default: -1)
-t N, --threads N number of threads to use during computation (default: 4)
-p PROMPT, --prompt PROMPT
prompt to start generation with (default: random)
-n N, --n_predict N number of tokens to predict (default: 128)
--top_k N top-k sampling (default: 40)
--top_p N top-p sampling (default: 0.9)
--repeat_last_n N last n tokens to consider for penalize (default: 64)
--repeat_penalty N penalize repeat sequence of tokens (default: 1.3)
--temp N temperature (default: 0.8)
-b N, --batch_size N batch size for prompt processing (default: 8)
-m FNAME, --model FNAME
model path (default: models/llama-7B/ggml-model.bin)
main: seed = 1678619388
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: failed to open './models/7B/ggml-model-q4_0.bin'
main: failed to load model from './models/7B/ggml-model-q4_0.bin'
My pre-signed URL to download the model weights was broken.
Windows help me please
DId you follow the instructions in the README.md to download, convert and quantize the model? The model is not included in the repo.
DId you follow the instructions in the README.md to download, convert and quantize the model? The model is not included in the repo.
I tried everything .. I did not see a separate instruction for Windows (via CMake) =(
It is telling you it find the model in ./models/7B
. Is the ggml-model-q4_0.bin
file in that directory?
I don't use powershell, and I don't know what ./Release/llama.exe is yellow (I assume that means it exists?) but he is using forward slashes, and windows doesn't use those.. so idk if PS has some fancy shit to use the correct slashes or not also does cmake create a Release file just for the .exe or are the models in there too? anyway I am gonna assume that folder doesn't even exist because he's using the wrong slashes.
Well powershell supports forward slashes just fine, but in windows the path argument to llama.exe is passed verbatim, i.e. its up to llama.exe to handle parsing the relative file path correctly.
Reopened and corrected the issue title.
Not sure if related. But the ggml-model-q4_0.bin I am getting is only 296kb
There is no error.
C:\llama\models\7B>quantize ggml-model-f16.bin ggml-model-q4_0.bin 2
llama_model_quantize: loading model from 'ggml-model-f16.bin'
llama_model_quantize: n_vocab = 32000
llama_model_quantize: n_ctx = 512
llama_model_quantize: n_embd = 4096
llama_model_quantize: n_mult = 256
llama_model_quantize: n_head = 32
llama_model_quantize: n_layer = 32
llama_model_quantize: f16 = 1
tok_embeddings.weight - [ 4096, 32000], type = f16
C:\llama\models\7B>
You should check your model file, it's too small. I get this error because i wrong model_name spelling...
Check the downloaded files via checksums in SHA256 file. Please reopen if the issue still persists.