llama-gpt
llama-gpt copied to clipboard
How to use my own models?
How to use my own models?
@KarelWintersky if you're using docker, you can edit docker-compose.yml
, changing llama-2-7b-chat.bin
to your model's name:
https://github.com/getumbrel/llama-gpt/blob/3553e20a3d4f8605040e4edab76f8094cb545da3/docker-compose.yml#L10-L12
https://github.com/getumbrel/llama-gpt/blob/3553e20a3d4f8605040e4edab76f8094cb545da3/docker-compose.yml#L21
Then you'll have to add it to the docker container by adding:
volumes:
model-name.bin:/models/model-name.bin
just after the MODEL:
line. Then you should be able to run docker compose up -d
and it should use your model.
You might also want to comment out these lines from api/Dockerfile
to stop it from downloading the normal model:
https://github.com/getumbrel/llama-gpt/blob/3553e20a3d4f8605040e4edab76f8094cb545da3/api/Dockerfile#L13-L17
I think the best approach is not comment/change the files (Dockerfile and docker-compose), but using it as template to the new models.
The problem is that not all models are compatible with LlamaGPT.
As LlamaGPT uses Llama.cpp and is possible to run different models with it, the only way is follow the guide of compatible models, download, convert and try to load.
I'm doing some testing here. Soon I'll update about it.
[]'s
Yes, put the model in your local ./models folder like /models/my-own-llm-ggml-chat.bin' and update the environment variables in the docker compose.
For example:
- Find a GGML model from HuggingFace or your own
- Copy the download link if using a HuggingFace model.
- Update the
docker-compose.yml
file.
version: '3.6'
services:
orca-mini-v3-70b:
image: ghcr.io/abetlen/llama-cpp-python:latest
restart: on-failure
volumes:
- './models:/models'
- './api:/api'
ports:
- 3001:8000
environment:
MODEL: '/models/orca-mini-v3-70b.bin'
MODEL_DOWNLOAD_URL: 'https://huggingface.co/TheBloke/orca_mini_v3_70B-GGML/resolve/main/orca_mini_v3_70b.ggmlv3.q4_0.bin'
command: '/bin/sh /api/run.sh'
llama-gpt-ui:
image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
ports:
- 3000:3000
restart: on-failure
environment:
- 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
- 'OPENAI_API_HOST=http://orca-mini-v3-70b:8000'
- 'DEFAULT_MODEL=/models/orca-mini-v3-70b.bin'
- 'WAIT_HOSTS=orca-mini-v3-70b:8000'
- 'WAIT_TIMEOUT=3600'
Hey @edgar971 ,
I tried to run this model that you posted above but LlamaGPT was not able to do that.
I had the error:
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024
Did you change something the way llama.cpp run this model?
Reading the docs from this model, they explain that:
Compatibility Requires llama.cpp commit e76d630 or later.
Or one of the other tools and libraries listed above.
To use in llama.cpp, you must add -gqa 8 argument.
Makes sense, you can set that as an env variable in the docker compose file or run.sh file.
I think every question on any platform should be answered in the first place by AI, given the question is fully presented.
How long will this come true in GitHub?
When I try to run a newer model from huggingface the model loads but seems to somewhat break the UI for new chats
The bot still responds to questions but the new chat prompts go away
@stratus-ss were you able to fix that? Just had this issue trying to use Llama 3 7b.
@stratus-ss Find below a fix:
llama-gpt/ui/types/openai.ts
Add the below lines for your custom model:
export enum OpenAIModelID {
LLAMA_3_8b_Q4_K_M = '/models/Meta-Llama-3-8B-Instruct.Q4_K_M.gguf',
export const OpenAIModels: Record<OpenAIModelID, OpenAIModel> = {
[OpenAIModelID.LLAMA_3_8b_Q4_K_M]: {
id: OpenAIModelID.LLAMA_3_8b_Q4_K_M,
name: 'LLAMA 3 8B',
maxLength: 12000,
tokenLimit: 4000,
}