llama-gpt How to use my own models?

How to use my own models?

Aug 19 '23 20:08 KarelWintersky

@KarelWintersky if you're using docker, you can edit docker-compose.yml, changing llama-2-7b-chat.bin to your model's name: https://github.com/getumbrel/llama-gpt/blob/3553e20a3d4f8605040e4edab76f8094cb545da3/docker-compose.yml#L10-L12

https://github.com/getumbrel/llama-gpt/blob/3553e20a3d4f8605040e4edab76f8094cb545da3/docker-compose.yml#L21

Then you'll have to add it to the docker container by adding:

    volumes:
      model-name.bin:/models/model-name.bin

just after the MODEL: line. Then you should be able to run docker compose up -d and it should use your model.

You might also want to comment out these lines from api/Dockerfile to stop it from downloading the normal model: https://github.com/getumbrel/llama-gpt/blob/3553e20a3d4f8605040e4edab76f8094cb545da3/api/Dockerfile#L13-L17

Aug 19 '23 22:08 bolshoytoster

I think the best approach is not comment/change the files (Dockerfile and docker-compose), but using it as template to the new models.

The problem is that not all models are compatible with LlamaGPT.

As LlamaGPT uses Llama.cpp and is possible to run different models with it, the only way is follow the guide of compatible models, download, convert and try to load.

I'm doing some testing here. Soon I'll update about it.

[]'s

Aug 21 '23 12:08 21orangehat

Yes, put the model in your local ./models folder like /models/my-own-llm-ggml-chat.bin' and update the environment variables in the docker compose.

Aug 21 '23 14:08 edgar971

For example:

Find a GGML model from HuggingFace or your own
Copy the download link if using a HuggingFace model.
Update the docker-compose.yml file.


version: '3.6'

services:
  orca-mini-v3-70b:
    image: ghcr.io/abetlen/llama-cpp-python:latest
    restart: on-failure
    volumes:
      - './models:/models'
      - './api:/api'
    ports:
      - 3001:8000
    environment:
      MODEL: '/models/orca-mini-v3-70b.bin'
      MODEL_DOWNLOAD_URL: 'https://huggingface.co/TheBloke/orca_mini_v3_70B-GGML/resolve/main/orca_mini_v3_70b.ggmlv3.q4_0.bin'
    command: '/bin/sh /api/run.sh'

  llama-gpt-ui:
    image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
    ports:
      - 3000:3000
    restart: on-failure
    environment:
      - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
      - 'OPENAI_API_HOST=http://orca-mini-v3-70b:8000'
      - 'DEFAULT_MODEL=/models/orca-mini-v3-70b.bin'
      - 'WAIT_HOSTS=orca-mini-v3-70b:8000'
      - 'WAIT_TIMEOUT=3600'

Aug 21 '23 14:08 edgar971

Hey @edgar971 ,

I tried to run this model that you posted above but LlamaGPT was not able to do that.

I had the error:

error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024

Did you change something the way llama.cpp run this model?

Reading the docs from this model, they explain that:

Compatibility Requires llama.cpp commit e76d630 or later.

Or one of the other tools and libraries listed above.

To use in llama.cpp, you must add -gqa 8 argument.

Aug 21 '23 17:08 21orangehat

Makes sense, you can set that as an env variable in the docker compose file or run.sh file.

Aug 22 '23 01:08 edgar971

I think every question on any platform should be answered in the first place by AI, given the question is fully presented.

How long will this come true in GitHub?

Aug 22 '23 19:08 lityrdef

When I try to run a newer model from huggingface the model loads but seems to somewhat break the UI for new chats

The bot still responds to questions but the new chat prompts go away

Sep 14 '23 16:09 stratus-ss

@stratus-ss were you able to fix that? Just had this issue trying to use Llama 3 7b.

Apr 22 '24 23:04 Ualas

@stratus-ss Find below a fix:

llama-gpt/ui/types/openai.ts

Add the below lines for your custom model:

export enum OpenAIModelID {
LLAMA_3_8b_Q4_K_M = '/models/Meta-Llama-3-8B-Instruct.Q4_K_M.gguf',

export const OpenAIModels: Record<OpenAIModelID, OpenAIModel> = {
  [OpenAIModelID.LLAMA_3_8b_Q4_K_M]: {
    id: OpenAIModelID.LLAMA_3_8b_Q4_K_M,
    name: 'LLAMA 3 8B',
    maxLength: 12000,
    tokenLimit: 4000,
  }

Apr 23 '24 18:04 Ualas

llama-gpt llama-gpt copied to clipboard

How to use my own models?

llama-gpt
llama-gpt copied to clipboard