llama-gpt icon indicating copy to clipboard operation
llama-gpt copied to clipboard

How to use my own models?

Open KarelWintersky opened this issue 1 year ago • 10 comments

How to use my own models?

KarelWintersky avatar Aug 19 '23 20:08 KarelWintersky

@KarelWintersky if you're using docker, you can edit docker-compose.yml, changing llama-2-7b-chat.bin to your model's name: https://github.com/getumbrel/llama-gpt/blob/3553e20a3d4f8605040e4edab76f8094cb545da3/docker-compose.yml#L10-L12

https://github.com/getumbrel/llama-gpt/blob/3553e20a3d4f8605040e4edab76f8094cb545da3/docker-compose.yml#L21

Then you'll have to add it to the docker container by adding:

    volumes:
      model-name.bin:/models/model-name.bin

just after the MODEL: line. Then you should be able to run docker compose up -d and it should use your model.

You might also want to comment out these lines from api/Dockerfile to stop it from downloading the normal model: https://github.com/getumbrel/llama-gpt/blob/3553e20a3d4f8605040e4edab76f8094cb545da3/api/Dockerfile#L13-L17

bolshoytoster avatar Aug 19 '23 22:08 bolshoytoster

I think the best approach is not comment/change the files (Dockerfile and docker-compose), but using it as template to the new models.

The problem is that not all models are compatible with LlamaGPT.

As LlamaGPT uses Llama.cpp and is possible to run different models with it, the only way is follow the guide of compatible models, download, convert and try to load.

I'm doing some testing here. Soon I'll update about it.

[]'s

21orangehat avatar Aug 21 '23 12:08 21orangehat

Yes, put the model in your local ./models folder like /models/my-own-llm-ggml-chat.bin' and update the environment variables in the docker compose.

edgar971 avatar Aug 21 '23 14:08 edgar971

For example:

  1. Find a GGML model from HuggingFace or your own
  2. Copy the download link if using a HuggingFace model.
  3. Update the docker-compose.yml file.

version: '3.6'

services:
  orca-mini-v3-70b:
    image: ghcr.io/abetlen/llama-cpp-python:latest
    restart: on-failure
    volumes:
      - './models:/models'
      - './api:/api'
    ports:
      - 3001:8000
    environment:
      MODEL: '/models/orca-mini-v3-70b.bin'
      MODEL_DOWNLOAD_URL: 'https://huggingface.co/TheBloke/orca_mini_v3_70B-GGML/resolve/main/orca_mini_v3_70b.ggmlv3.q4_0.bin'
    command: '/bin/sh /api/run.sh'

  llama-gpt-ui:
    image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
    ports:
      - 3000:3000
    restart: on-failure
    environment:
      - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
      - 'OPENAI_API_HOST=http://orca-mini-v3-70b:8000'
      - 'DEFAULT_MODEL=/models/orca-mini-v3-70b.bin'
      - 'WAIT_HOSTS=orca-mini-v3-70b:8000'
      - 'WAIT_TIMEOUT=3600'

edgar971 avatar Aug 21 '23 14:08 edgar971

Hey @edgar971 ,

I tried to run this model that you posted above but LlamaGPT was not able to do that.

I had the error:

error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024

Did you change something the way llama.cpp run this model?

Reading the docs from this model, they explain that:

Compatibility Requires llama.cpp commit e76d630 or later.

Or one of the other tools and libraries listed above.

To use in llama.cpp, you must add -gqa 8 argument.

21orangehat avatar Aug 21 '23 17:08 21orangehat

Makes sense, you can set that as an env variable in the docker compose file or run.sh file.

edgar971 avatar Aug 22 '23 01:08 edgar971

I think every question on any platform should be answered in the first place by AI, given the question is fully presented.

How long will this come true in GitHub?

lityrdef avatar Aug 22 '23 19:08 lityrdef

When I try to run a newer model from huggingface the model loads but seems to somewhat break the UI for new chats

image

The bot still responds to questions but the new chat prompts go away

stratus-ss avatar Sep 14 '23 16:09 stratus-ss

@stratus-ss were you able to fix that? Just had this issue trying to use Llama 3 7b.

Ualas avatar Apr 22 '24 23:04 Ualas

@stratus-ss Find below a fix:

llama-gpt/ui/types/openai.ts

Add the below lines for your custom model:

export enum OpenAIModelID {
LLAMA_3_8b_Q4_K_M = '/models/Meta-Llama-3-8B-Instruct.Q4_K_M.gguf',
export const OpenAIModels: Record<OpenAIModelID, OpenAIModel> = {
  [OpenAIModelID.LLAMA_3_8b_Q4_K_M]: {
    id: OpenAIModelID.LLAMA_3_8b_Q4_K_M,
    name: 'LLAMA 3 8B',
    maxLength: 12000,
    tokenLimit: 4000,
  }

Ualas avatar Apr 23 '24 18:04 Ualas