gpt4all
gpt4all copied to clipboard
Generate Embeddings
Hi @AndriyMulyar, thanks for all the hard work in making this available. I was wondering whether there's a way to generate embeddings using this model so we can do question and answering using custom set of documents? I feel like that will be a great addition; especially from an enterprise context.
I'd really like to know this as well. Maybe someone is working on a OpenAI like library for python?
See the gpt4all readme for the new official bindings. Getting embeddings out will be high in the priority list.
On Mon, Apr 3, 2023, 8:06 AM Queentessence999 @.***> wrote:
I'd really like to know this as well. Maybe someone is working on a OpenAI like library for python?
— Reply to this email directly, view it on GitHub https://github.com/nomic-ai/gpt4all/issues/194#issuecomment-1494200939, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJ4TBTDAZBBRDH3O7LGTBDW7K4LLANCNFSM6AAAAAAWRDYPYY . You are receiving this because you were mentioned.Message ID: @.***>
@AndriyMulyar so this support is not yet developed, right? Any clue about dates? I found nothing in the gpt4all readme
@AndriyMulyar Just wondering if there is any progress on getting embeddings. Really looking forward to this!
It is on the priority list!
On Wed, Apr 26, 2023, 10:29 PM luciameng1989 @.***> wrote:
@AndriyMulyar https://github.com/AndriyMulyar Just wondering if there is any progress on getting embeddings. Really looking forward to this!
— Reply to this email directly, view it on GitHub https://github.com/nomic-ai/gpt4all/issues/194#issuecomment-1524495563, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJ4TBTEP6L47ZO67AUJUGDXDHKYPANCNFSM6AAAAAAWRDYPYY . You are receiving this because you were mentioned.Message ID: @.***>
In the meantime, check out sentence bert! It's a high quality free to use embedding model.
It would indeed be great to have a possibility to do embeddings based on gpt4all. Any progress on this?
@marc-dsalab you can use some other models, for example:
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS)
llm = GPT4All(model="models/ggml-gpt4all-j-v1.3-groovy.bin", n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)
@marc-dsalab you can use some other models, for example:
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2") db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS) llm = GPT4All(model="models/ggml-gpt4all-j-v1.3-groovy.bin", n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False) qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)
That works well. Yet, it would be nice to have "native" gpt4all embeddings. ;)
Any update on Embedding in GPT4All? I am a long-time C# dev. Are you planning C# bindings? Also, I am not clear: people suggest using other models for creating embeddings. Is there C# bindings for that? Can anybody explain? I see that embeddings are created with HuggingFaceEmbedding model. Then, I assume it is Chroma vector database... and then what?
`embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS)
llm = GPT4All(model="models/ggml-gpt4all-j-v1.3-groovy.bin", n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)`
@marc-dsalab you can use some other models, for example:
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2") db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS) llm = GPT4All(model="models/ggml-gpt4all-j-v1.3-groovy.bin", n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False) qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)
That works well. Yet, it would be nice to have "native" gpt4all embeddings. ;)
Hi - where did you get the function "HuggingFaceEmbeddings" and what did you use as your "retriever" variable?
@TomasMiloCA HuggingFaceEmbeddings
are from the langchain library, retriever
is from ChromaDB. Pasting you the real method from my program:
def process_database_question(database_name, llm):
embeddings = OpenAIEmbeddings() if openai_use else HuggingFaceEmbeddings(model_name=ingest_embeddings_model)
persist_dir = f"./db/{database_name}"
db = Chroma(persist_directory=persist_dir, embedding_function=embeddings, client_settings=Settings(
chroma_db_impl='duckdb+parquet',
persist_directory=persist_dir,
anonymized_telemetry=False
))
retriever = db.as_retriever(search_kwargs={"k": ingest_target_source_chunks if ingest_target_source_chunks else args.ingest_target_source_chunks})
template = """You are a an AI assistant providing helpful advice. You are given the following extracted parts of a long document and a question.
Provide a conversational answer based on the context provided. If you can't find the answer in the context below, just say
"Hmm, I'm not sure." Don't try to make up an answer. If the question is not related to the context, politely respond
that you are tuned to only answer questions that are related to the context.
Question: {question}
=========
{context}
=========
Answer:"""
question_prompt = PromptTemplate(template=template, input_variables=["question", "context"])
qa = ConversationalRetrievalChain.from_llm(llm=llm, condense_question_prompt=question_prompt, retriever=retriever, chain_type="stuff", return_source_documents=not args.hide_source)
return qa
This will be really great to be able to do. Would love question my own set of documents.
So currently we are not able to perform embedding to any GPT4all models?
@goheesheng You can do it using different models though, like the example above, @TomasMiloCA is using the huggingface model with chromadb.
Bert is meant to be used with embeddings.
It was added to the official JSON list of models: https://github.com/nomic-ai/gpt4all/commit/a0dae86a957337b20c3a64cc48480126062b9300
I put a comment in about whether Bert or similar models would be supported or at least work.
I think this is implemented now?