gpt4all How can I implement a custom LLM model LangChain class wrapper for GPT4All model?

How can I implement a custom LLM model LangChain class wrapper for GPT4All model?

Open traverse-in-reverse opened this issue 1 year ago • 16 comments

Is it possible to do what described here https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-using-a-custom-llm-model
or here https://python.langchain.com/en/latest/modules/models/llms/examples/custom_llm.html#how-to-write-a-custom-llm-wrapper whith https://github.com/nomic-ai/gpt4all model? I'd appreciate any help / hints!

Apr 02 '23 01:04 traverse-in-reverse

this is being built as we speak

Apr 02 '23 02:04 AndriyMulyar

Lovely! I'd love to test that on my 50M collection of Q&A articles, so much potential!

Apr 02 '23 02:04 traverse-in-reverse

Really excited for this !

Apr 04 '23 08:04 fadnavismehul

seems like it is released. Version 0.0.131 👀

Apr 04 '23 15:04 alexxriv

https://github.com/hwchase17/langchain/releases/tag/v0.0.131

Apr 04 '23 15:04 alexxriv

Any chance there's an example how to use it? I'm looking to swap OpenAI with Gpt4all in code https://colab.research.google.com/drive/1JYTczk-4D86XNn0GTaXux5yi2-LfoIPd?usp=sharing

Apr 04 '23 15:04 traverse-in-reverse

I'm looking at the exact same thing. I don't know how to do it yet myself. Will spend sometime on it tomorrow. If I get something out, will share here. Meanwhile, if you come across something do share, thanks! :)

Apr 04 '23 16:04 fadnavismehul

It is now released. https://twitter.com/LangChainAI/status/1643261943803957249?t=N5nC6IQgfeJo6Kda2ri49w&s=19

On Tue, Apr 4, 2023, 12:08 PM Mehul @.***> wrote:

I'm looking at the exact same thing. I don't know how to do it yet myself. Will spend sometime on it tomorrow. If I get something out, will share here. Meanwhile, if you come across something do share, thanks! :)

— Reply to this email directly, view it on GitHub https://github.com/nomic-ai/gpt4all/issues/173#issuecomment-1496242512, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJ4TBWV6TQBWO6SFPASTRLW7RBQBANCNFSM6AAAAAAWP7PYNY . You are receiving this because you commented.Message ID: @.***>

Apr 04 '23 16:04 AndriyMulyar

@fadnavismehul this seems to be closes thing https://blog.ouseful.info/2023/04/04/running-gpt4all-on-a-mac-using-python-langchain-in-a-jupyter-notebook/ https://blog.ouseful.info/2023/04/04/langchain-query-gpt4all-against-knowledge-source/

Apr 04 '23 22:04 traverse-in-reverse

Does anyone have a working example? I am struggling with the exception that ctx is not properly initialized when I try docsearch = Chroma.from_documents(documents = texts, embedding = embeddings)

I load my model like this: embeddings = LlamaCppEmbeddings(model_path=GPT4ALL_MODEL_PATH)

Apr 09 '23 05:04 sime2408

@sime2408, here's my tiny working test (WSL2/Ubuntu)

# https://python.langchain.com/en/latest/ecosystem/llamacpp.html
# pip uninstall -y langchain
# pip install --upgrade git+https://github.com/hwchase17/langchain.git
#
# https://abetlen.github.io/llama-cpp-python/
# pip uninstall -y llama-cpp-python
# pip install --upgrade llama-cpp-python
# pip install chromadb
#
# how to create one https://github.com/nomic-ai/pyllamacpp

import os
from langchain.chains import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader
from langchain.llms import LlamaCpp
from langchain.embeddings import LlamaCppEmbeddings

GPT4ALL_MODEL_PATH = "./gpt4all-converted.bin"

def ask(question, qa):
    print('\n' + question)
    print(qa.run(question)+'\n\n')

persist_directory = './.chroma'
collection_name = 'data'
document_name = './test_import.txt'

llama_embeddings = LlamaCppEmbeddings(model_path=GPT4ALL_MODEL_PATH)

if not os.path.isdir(persist_directory):
    print('Parsing ' + document_name)
    loader = TextLoader(document_name)
    documents = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=500, chunk_overlap=0)
    texts = text_splitter.split_documents(documents)
    vectordb = Chroma.from_documents(
        documents=texts, embedding=llama_embeddings, collection_name=collection_name, persist_directory=persist_directory)
    vectordb.persist()
    print(vectordb)
    print('Saved to ' + persist_directory)
else:
    print('Loading ' + persist_directory)
    vectordb = Chroma(persist_directory=persist_directory,
                      embedding_function=llama_embeddings, collection_name=collection_name)
    print(vectordb)

llm = LlamaCpp(model_path=GPT4ALL_MODEL_PATH)

qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=vectordb.as_retriever(search_kwargs={"k": 1}))

ask("Question1", qa);
ask("Question2", qa);
ask("Question3", qa);

Apr 09 '23 15:04 traverse-in-reverse

Thanks @traverse-in-reverse I was trying at the end to create a custom wrapper, and also tried Vicuna model, here are some trials if of any value: vicuna colab

Apr 09 '23 15:04 sime2408

@traverse-in-reverse found the issue of why I couldn't run my examples, my model couldn't be initialized. I switched to GGML gpt4all-repo and now it works. Not sure why

Apr 10 '23 08:04 sime2408

@traverse-in-reverse any idea why I am getting below error, with same code block, which you have provided:

Error: NoIndexException: Index not found, please create an instance before querying

Apr 13 '23 09:04 msinha251

The NoIndexException can be fixed like described here https://github.com/hwchase17/langchain/issues/2491#issuecomment-1499082189

Apr 14 '23 21:04 simeonradivoev

I need to train gpt4all with the BWB dataset (a large-scale document-level Chinese--English parallel dataset for machine translations). Is there any guide on how to do this?

Apr 17 '23 06:04 Emasoft

Stale, please open a new issue if this still occurs

Aug 10 '23 15:08 niansa

gpt4all gpt4all copied to clipboard

How can I implement a custom LLM model LangChain class wrapper for GPT4All model?

gpt4all
gpt4all copied to clipboard