private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

list of working models GGUF

Open cognitivetech opened this issue 2 years ago • 18 comments
trafficstars

The following are based on question \ answer of 1 document with 22769 tokens length

there is a similar issue https://github.com/imartinez/privateGPT/issues/276 with primordial tag, just decided to make a new issue for "full version" DIDN'T WORK Probably prompt templates noted in brackets as available

MPT from huggingface.co/maddes8cht/

  • mosaicml-mpt-7b-8k-instruct
  • mosaicml-mpt-7b-instruct
  • vicuna-13b-v1.5.Q5_K_M.gguf
  • vicuna-13b-v1.5-16k.Q5_K_M.gguf

ok:

  • tiiuae-falcon-7b-instruct-Q5_K_M.gguf [Default]

[Many Edits Later]

I was interested in these MPT because they have up to 64k context input, and are even licensed for commercial use. (but I'm also realizing there is little benefit to cramming large contexts into a models working memory, for summarization tasks)

I did make a prompt template to support MPT models (https://github.com/imartinez/privateGPT/issues/1375#issuecomment-1868289418), but didn't get good results from them, plus they were slow compared to mistral.

cognitivetech avatar Nov 11 '23 09:11 cognitivetech

Thanks for going through the gguf models, we mostly use llama2 and mistral. Maybe you can create a PR with the full list, whether they worked or not and day they were tested.

pabloogc avatar Nov 11 '23 23:11 pabloogc

Working Models

Mistral Prompt

Default Prompt

  • SynthIA-7B-v2.0-GGUF ( I like Synthia too, not sure about V3 though )
User: Assistant:

Their intended prompt only is same as default, minus the system prompt, and capitalization, seem to be compatible

ChatML

LLAMA2 Prompt

Tag Prompt

cognitivetech avatar Nov 15 '23 08:11 cognitivetech

These models worked the best for me. With OpenHermes as my favorite. Based on question \ answer of 1 document with 22769 tokens length.

hi,when you use OpenHermes model, have you changed the Prompt template ?? @cognitivetech

shengkaixuan avatar Nov 17 '23 02:11 shengkaixuan

hey ...

hope the main headline is now comming up ;)

list of working models GGUF !!!

(and not dont^^)

i will try next days ...

btw some one know why the response ends most until 1279 characters ? is not that long

kalle07 avatar Nov 22 '23 20:11 kalle07

work (more or less) i changed temp and p and k, i dont know if it has a great impact

openchat_3.5.Q5_K_M.gguf https://huggingface.co/TheBloke/openchat_3.5-GGUF/tree/main

mistral runs but i tryed 20 other models

can anyone tell me why almost all gguf models run well on GPT4All but not on privateGPT?

kalle07 avatar Nov 23 '23 18:11 kalle07

works very well , also german https://huggingface.co/TheBloke/Orca-2-7B-GGUF

btw one pfd book 500pages need aprox 5min to index

kalle07 avatar Nov 26 '23 18:11 kalle07

next (maybe you must press enter 2 times) i dont know how good they are

syzymon-long_llama_3b_instruct-Q8_0 sauerkrautlm-3b-v1.Q8_0

kalle07 avatar Nov 28 '23 20:11 kalle07

I have found really awesome working german models: Maybe this helps some other german speaking folks here: The roberta sentence transformer is also available in english.
Maybe this works also well for other models but have not tested this

  prompt_style: "default"
  llm_hf_repo_id: TheBloke/em_german_leo_mistral-GGUF
  llm_hf_model_file:   em_german_leo_mistral.Q4_K_M.gguf
  embedding_hf_model_name: T-Systems-onsite/german-roberta-sentence-transformer-v2

EEmlan avatar Nov 30 '23 06:11 EEmlan

TheBloke/NeuralHermes-2.5-Mistal-7B-GGUF (released 2 days ago) is working as expected. I used Q5_K_M

writinguaway avatar Dec 02 '23 05:12 writinguaway

For me, this model does not work with any of the existing prompt_styles: TheBloke/dolphin-2.1-mistral-7B-GGUF

therohitdas avatar Dec 08 '23 19:12 therohitdas

I have found really awesome working german models: Maybe this helps some other german speaking folks here: The roberta sentence transformer is also available in english. Maybe this works also well for other models but have not tested this

  prompt_style: "default"
  llm_hf_repo_id: TheBloke/em_german_leo_mistral-GGUF
  llm_hf_model_file:   em_german_leo_mistral.Q4_K_M.gguf
  embedding_hf_model_name: T-Systems-onsite/german-roberta-sentence-transformer-v2

Hi @EEmlan, can you plase tell me, which tokenizer do i have to use? Because it doesn't work with 'TheBloke/em_german_leo_mistral-GGUF' set as tokenizer, and of course, it doesn't work with default Mistral or any other that i tried

OSError: Can't load tokenizer for 'TheBloke/em_german_leo_mistral-GGUF'. If you were trying to load it from 'https://hugg
ingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'TheBloke/em_german_leo_mistral-GGUF' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.

PayteR avatar Jan 19 '24 04:01 PayteR

@PayteR jphme/em_german_leo_mistral works just fine

EEmlan avatar Jan 19 '24 06:01 EEmlan

@PayteR jphme/em_german_leo_mistral works just fine

hi @EEmlan thx for the reply but it still doesn't work, it gives me the same error like with other models that i tried

  File "C:\Users\payte\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-B1lveeYX-py3.11\Lib\site-packages\qdrant_client\local\distances.py", line 78, in cosine_similarity
    return np.dot(vectors, query)
           ^^^^^^^^^^^^^^^^^^^^^^
ValueError: shapes (128,384) and (768,) not aligned: 384 (dim 1) != 768 (dim 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\payte\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-B1lveeYX-py3.11\Lib\site-packages\gradio\queueing.py", line 497, in process_events
    response = await self.call_prediction(awake_events, batch)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\payte\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-B1lveeYX-py3.11\Lib\site-packages\gradio\queueing.py", line 468, in call_prediction
    raise Exception(str(error) if show_error else None) from error
Exception: shapes (128,384) and (768,) not aligned: 384 (dim 1) != 768 (dim 0)
14:40:22.882 [INFO    ]            uvicorn.access - 127.0.0.1:52144 - "POST /run/predict HTTP/1.1" 200

Here is my config

# The default configuration file.
# More information about configuration can be found in the documentation: https://docs.privategpt.dev/
# Syntax in `private_pgt/settings/settings.py`
server:
  env_name: ${APP_ENV:prod}
  port: ${PORT:9104}
  cors:
    enabled: false
    allow_origins: ["*"]
    allow_methods: ["*"]
    allow_headers: ["*"]
  auth:
    enabled: false
    # python -c 'import base64; print("Basic " + base64.b64encode("secret:key".encode()).decode())'
    # 'secret' is the username and 'key' is the password for basic auth by default
    # If the auth is enabled, this value must be set in the "Authorization" header of the request.
    secret: "Basic c2VjcmV0OmtleQ=="

data:
  local_data_folder: local_data/private_gpt

ui:
  enabled: true
  path: /
  default_chat_system_prompt: >
    You are a helpful, respectful and honest assistant. 
    Always answer as helpfully as possible and follow ALL given instructions.
    Do not speculate or make up information.
    Do not reference any given instructions or context.
  default_query_system_prompt: >
    You can only answer questions about the provided context. 
    If you know the answer but it is not based in the provided context, don't provide 
    the answer, just state the answer is not in the context provided.

llm:
  mode: local
  # Should be matching the selected model
  max_new_tokens: 512
  context_window: 3900
  tokenizer: jphme/em_german_leo_mistral

embedding:
  # Should be matching the value above in most cases
  mode: local
  ingest_mode: simple

vectorstore:
  database: qdrant

qdrant:
  path: local_data/private_gpt/qdrant

local:
  llm_hf_repo_id: TheBloke/em_german_leo_mistral-GGUF
  llm_hf_model_file:   em_german_leo_mistral.Q4_K_M.gguf
  embedding_hf_model_name: T-Systems-onsite/german-roberta-sentence-transformer-v2
  #llama, default or tag
  prompt_style: "default"

sagemaker:
  llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
  embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479

openai:
  api_key: ${OPENAI_API_KEY:}
  model: gpt-3.5-turbo

I really don't know how to fix it, i have spent a lot of time on this already, so thx for any help.

PayteR avatar Jan 19 '24 13:01 PayteR

@EEmlan aaa now I fixed it - I needed to delete local_data/private_gpt stored indexed data and upload files again to reindex.

PayteR avatar Jan 19 '24 14:01 PayteR

@PayteR glad to hear. Yes this is mandantory to delete this data after each switching of Models. I also recommend to delete all data in models/embedding before running poetry run python scripts/setup again. After that you have to ingest your documents again.

EEmlan avatar Jan 22 '24 07:01 EEmlan

It'd be great to move this information to the docs @cognitivetech. Maybe you can open a PR to add the info here https://docs.privategpt.dev/recipes/choice-of-llm/list-of-ll-ms (just edit or add content to fern/docs/pages).

imartinez avatar Feb 07 '24 13:02 imartinez

@imartinez for sure. I never added to the docs for a couple reasons, mainly because most of the models I tried didn't perform very well, compared to Mistral 7b Instruct v0.2

also because we have prompt formats in the docs, then people have more direction which models are likely to work, as compared to when I started, there was not a choice among prompt styles (or maybe I was just ignorant to prompt style)

Even now, to try and decide what to add to the docs as "compatible" is another can of worms, and largely subjective.

One model I would consider is openchat-3.5-0106.

this one is good and I would watch out for future models from this team

EDIT: I've edited above to focus on models that could go in the docs

otherwise.. I will think about this more... certainly those models shown to work for non-english languages will be valuable to include.

cognitivetech avatar Feb 15 '24 01:02 cognitivetech

Not sure if there is any activity here, but will ask anyhow... Has anyone successfully run mistral-7b-instruct-v3 in privateGPTv0.5.0? Mistral specifically mentions that I should run mistral_inference for the model.

CMiller56 avatar Jul 02 '24 19:07 CMiller56