private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

Language support

Open PierreVannier opened this issue 1 year ago • 9 comments

Hello there

I'd like to run / ingest this project with french documents. It seems to me the models suggested aren't working with anything but english documents, am I right ? Anyone's got suggestions about how to run it with documents written not in english ? I assume one must download a GPT4ALL compatible model. Where to find these ? Any available for french ?

Thanks for the clue.

P.S. It seems a frequent question but without any probant suggestions / clues

PierreVannier avatar May 16 '23 06:05 PierreVannier

Hello Pierre,

You can use https://github.com/bofenghuang/vigogne which is a french LLM compatible with Llama-CPP

PierreMory avatar May 16 '23 08:05 PierreMory

I test the document with the Khmer language " Cambodia country " written ins unicode .. and not work out well I got an invalid token error. I hope it works. In chatGPT itself, I can use the language fine.

Traceback (most recent call last): File "A:\vscodes\privateGPT\ingest.py", line 62, in main() File "A:\vscodes\privateGPT\ingest.py", line 56, in main

diamondbarcode avatar May 16 '23 16:05 diamondbarcode

Hello Pierre,

You can use https://github.com/bofenghuang/vigogne which is a french LLM compatible with Llama-CPP Hey Pierre, Thanks for the heads up ! How to do that exactly ? I can't find vigogne ready to use model. Have you done the procedure yourself ? Thanks

PierreVannier avatar May 17 '23 05:05 PierreVannier

Hello Pierre, You can use https://github.com/bofenghuang/vigogne which is a french LLM compatible with Llama-CPP Hey Pierre, Thanks for the heads up ! How to do that exactly ? I can't find vigogne ready to use model. Have you done the procedure yourself ? Thanks

You can have more details on this page : https://github.com/bofenghuang/vigogne/tree/main/vigogne/inference#llamacpp This tutorial (in French) explains how to create the model but I downloaded it directly from this discord channel : https://discord.com/channels/1092039071435599874/1101966544906485800

I also recommand to change the model used for embeddings. I get better results with SentenceTransformers (https://python.langchain.com/en/latest/modules/models/text_embedding/examples/sentence_transformers.html). I used this multilingual model : paraphrase-multilingual-mpnet-base-v2 (https://www.sbert.net/docs/pretrained_models.html)

PierreMory avatar May 17 '23 09:05 PierreMory

Does somebody know a compatible German LLM?

danielwiegand avatar May 17 '23 09:05 danielwiegand

Hello Pierre, You can use https://github.com/bofenghuang/vigogne which is a french LLM compatible with Llama-CPP Hey Pierre, Thanks for the heads up ! How to do that exactly ? I can't find vigogne ready to use model. Have you done the procedure yourself ? Thanks

You can have more details on this page : https://github.com/bofenghuang/vigogne/tree/main/vigogne/inference#llamacpp This tutorial (in French) explains how to create the model but I downloaded it directly from this discord channel : https://discord.com/channels/1092039071435599874/1101966544906485800

I also recommand to change the model used for embeddings. I get better results with SentenceTransformers (https://python.langchain.com/en/latest/modules/models/text_embedding/examples/sentence_transformers.html). I used this multilingual model : paraphrase-multilingual-mpnet-base-v2 (https://www.sbert.net/docs/pretrained_models.html)

Wow, great content Pierre !! I've more than enough material to mess around another couple of week-ends of mine !! 😆 I let you know how it goes. Thanks a lot

PierreVannier avatar May 17 '23 11:05 PierreVannier

@PierreMory , I've followed Pere Conteur tuto and ingest a bunch of french PDF but is it normal that when querying it replies in english ?

PierreVannier avatar May 19 '23 07:05 PierreVannier

I encountered the same problem. I managed to get French answers by customizing the prompt given to the chain. Here is my code :

from langchain.prompts import PromptTemplate

prompt_template = """Instructions: Use the following pieces of context to answer the question at the end. If you cannot answer with the given context, or if you don't know the answer, just say that you don't know, don't try to make up an answer. Always answer in French.

Context:
{context}

Question: 
{question}

Answer (in French):"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["question", "context"]
)

Then later in the code :

    qa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True,
        chain_type_kwargs={"prompt": PROMPT},
    )

With that prompt you should get french answers !

PierreMory avatar May 20 '23 14:05 PierreMory

I encountered the same problem. I managed to get French answers by customizing the prompt given to the chain. Here is my code :

from langchain.prompts import PromptTemplate

prompt_template = """Instructions: Use the following pieces of context to answer the question at the end. If you cannot answer with the given context, or if you don't know the answer, just say that you don't know, don't try to make up an answer. Always answer in French.

Context:
{context}

Question: 
{question}

Answer (in French):"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["question", "context"]
)

Then later in the code :

    qa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True,
        chain_type_kwargs={"prompt": PROMPT},
    )

With that prompt you should get french answers !

Hi Pierre, Sorry for the late reply. Yes, I figure this out as well. It's working now. Have you tried the feature to add documents even after the first ingesting phase ?

PierreVannier avatar May 23 '23 08:05 PierreVannier

No. When I want to add more data I drop the DB and I run ingest.py again. It is not a real problem as the sentence embeddings I am using is pretty fast.

PierreMory avatar May 24 '23 12:05 PierreMory

Hello, how can i use it in Italian too?

Thanks

NukeDev avatar Jun 11 '23 20:06 NukeDev

If anyone can post an updated tutorial on how to use a french llm with privateGPT. The PereConteur tuto doesn't seems to work here. Can we (and where) download the .bin and only change the .env ?

Mer0me avatar Sep 11 '23 09:09 Mer0me