Darrin Hodges
Darrin Hodges
Also tried: mpt-7b-instruct.ggmlv3.q8_0.bin and get: gptj_model_load: invalid model file 'models/mpt-7b-instruct.ggmlv3.q8_0.bin' (bad vocab size 2007 != 4096)
Have been getting similar errors with various models as per below, the error is the same: `gptj_model_load: invalid model file 'models/mpt-7b-instruct.ggmlv3.q8_0.bin' (bad vocab size 2007 != 4096)`
for reference, after running the requirements, I still had to install the following (on clean environment): - python -m pip install python-dotenv - pip install tqdm - pip install langchain...
> > for reference, after running the requirements, I still had to install the following (on clean environment): > > > > * python -m pip install python-dotenv > >...
> ```python > #!/bin/bash > export LLAMA_CUBLAS=1 > source ~/anaconda3/bin/activate > #check if venv virtual env exists > if conda info --envs | grep -q "venv" > then > echo...
this is where it fails: ``` g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c llama.cpp -o llama.o...
ok, installing the latest nvidia toolkit (12.1) has allowed llama-cpp-python to build correctly, seem the ubuntu packages are somewhat out of date. also had to edit /etc/security/limits.conf to raise the...
ok, got it working with n_batch 2000, not as fast as a previous poster but better than before ``` Using embedded DuckDB with persistence: data will be stored in: db...
where the previous result had: llama_model_load_internal: [cublas] offloading 12 layers to GPU llama_model_load_internal: [cublas] total VRAM used: 2722 MB is it GPU dependent, the one used is 12GB, would a...
thanks DanielusG, I tried increasing the layers, the timings didn't change much ``` 24 layers llama_print_timings: load time = 16568.45 ms llama_print_timings: sample time = 36.19 ms / 64 runs...