gpt4all Generate Embeddings

Generate Embeddings

Open upulsen opened this issue 1 year ago • 10 comments

Hi @AndriyMulyar, thanks for all the hard work in making this available. I was wondering whether there's a way to generate embeddings using this model so we can do question and answering using custom set of documents? I feel like that will be a great addition; especially from an enterprise context.

Apr 03 '23 10:04 upulsen

I'd really like to know this as well. Maybe someone is working on a OpenAI like library for python?

Apr 03 '23 12:04 Queentessence999

See the gpt4all readme for the new official bindings. Getting embeddings out will be high in the priority list.

On Mon, Apr 3, 2023, 8:06 AM Queentessence999 @.***> wrote:

I'd really like to know this as well. Maybe someone is working on a OpenAI like library for python?

— Reply to this email directly, view it on GitHub https://github.com/nomic-ai/gpt4all/issues/194#issuecomment-1494200939, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJ4TBTDAZBBRDH3O7LGTBDW7K4LLANCNFSM6AAAAAAWRDYPYY . You are receiving this because you were mentioned.Message ID: @.***>

Apr 03 '23 13:04 AndriyMulyar

@AndriyMulyar so this support is not yet developed, right? Any clue about dates? I found nothing in the gpt4all readme

Apr 06 '23 13:04 sime2408

@AndriyMulyar Just wondering if there is any progress on getting embeddings. Really looking forward to this!

Apr 27 '23 02:04 luciameng1989

It is on the priority list!

On Wed, Apr 26, 2023, 10:29 PM luciameng1989 @.***> wrote:

@AndriyMulyar https://github.com/AndriyMulyar Just wondering if there is any progress on getting embeddings. Really looking forward to this!

— Reply to this email directly, view it on GitHub https://github.com/nomic-ai/gpt4all/issues/194#issuecomment-1524495563, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJ4TBTEP6L47ZO67AUJUGDXDHKYPANCNFSM6AAAAAAWRDYPYY . You are receiving this because you were mentioned.Message ID: @.***>

Apr 27 '23 02:04 AndriyMulyar

In the meantime, check out sentence bert! It's a high quality free to use embedding model.

Apr 27 '23 02:04 AndriyMulyar

It would indeed be great to have a possibility to do embeddings based on gpt4all. Any progress on this?

May 19 '23 06:05 marc-dsalab

@marc-dsalab you can use some other models, for example:

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS)

llm = GPT4All(model="models/ggml-gpt4all-j-v1.3-groovy.bin", n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False)

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)

May 19 '23 06:05 sime2408

@marc-dsalab you can use some other models, for example:

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS)

llm = GPT4All(model="models/ggml-gpt4all-j-v1.3-groovy.bin", n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False)

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)

That works well. Yet, it would be nice to have "native" gpt4all embeddings. ;)

May 23 '23 09:05 marc-dsalab

Any update on Embedding in GPT4All? I am a long-time C# dev. Are you planning C# bindings? Also, I am not clear: people suggest using other models for creating embeddings. Is there C# bindings for that? Can anybody explain? I see that embeddings are created with HuggingFaceEmbedding model. Then, I assume it is Chroma vector database... and then what?

`embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS)

llm = GPT4All(model="models/ggml-gpt4all-j-v1.3-groovy.bin", n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False)

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)`

Jun 04 '23 19:06 securigy

@marc-dsalab you can use some other models, for example:

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS)

llm = GPT4All(model="models/ggml-gpt4all-j-v1.3-groovy.bin", n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False)

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)

That works well. Yet, it would be nice to have "native" gpt4all embeddings. ;)

Hi - where did you get the function "HuggingFaceEmbeddings" and what did you use as your "retriever" variable?

Jun 06 '23 19:06 TomasMiloCA

@TomasMiloCA HuggingFaceEmbeddings are from the langchain library, retriever is from ChromaDB. Pasting you the real method from my program:

def process_database_question(database_name, llm):
    embeddings = OpenAIEmbeddings() if openai_use else HuggingFaceEmbeddings(model_name=ingest_embeddings_model)
    persist_dir = f"./db/{database_name}"
    db = Chroma(persist_directory=persist_dir, embedding_function=embeddings, client_settings=Settings(
        chroma_db_impl='duckdb+parquet',
        persist_directory=persist_dir,
        anonymized_telemetry=False
    ))

    retriever = db.as_retriever(search_kwargs={"k": ingest_target_source_chunks if ingest_target_source_chunks else args.ingest_target_source_chunks})

    template = """You are a an AI assistant providing helpful advice. You are given the following extracted parts of a long document and a question. 
    Provide a conversational answer based on the context provided. If you can't find the answer in the context below, just say 
    "Hmm, I'm not sure." Don't try to make up an answer. If the question is not related to the context, politely respond 
    that you are tuned to only answer questions that are related to the context.
    
    Question: {question}
    =========
    {context}
    =========
    Answer:"""
    question_prompt = PromptTemplate(template=template, input_variables=["question", "context"])

    qa = ConversationalRetrievalChain.from_llm(llm=llm, condense_question_prompt=question_prompt, retriever=retriever, chain_type="stuff", return_source_documents=not args.hide_source)
    return qa

Jun 06 '23 19:06 sime2408

This will be really great to be able to do. Would love question my own set of documents.

Jun 14 '23 07:06 foscraft

So currently we are not able to perform embedding to any GPT4all models?

Jun 19 '23 13:06 goheesheng

@goheesheng You can do it using different models though, like the example above, @TomasMiloCA is using the huggingface model with chromadb.

Jun 20 '23 09:06 foscraft

Bert is meant to be used with embeddings.

It was added to the official JSON list of models: https://github.com/nomic-ai/gpt4all/commit/a0dae86a957337b20c3a64cc48480126062b9300

I put a comment in about whether Bert or similar models would be supported or at least work.

Jul 26 '23 15:07 lilpenguin42

I think this is implemented now?

Aug 10 '23 15:08 niansa

gpt4all gpt4all copied to clipboard

Generate Embeddings

gpt4all
gpt4all copied to clipboard