inconsistencies with gemma RAG with langchain

Open hemanth opened this issue 1 year ago • 0 comments

Trying RAG with gemma and langchain:

# Load and split documents
loader = WebBaseLoader("https://h3manth.com")
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

# Create vector store
vectorstore = FAISS.from_documents(documents=all_splits, embedding=HuggingFaceEmbeddings())

# Load RAG prompt
prompt = hub.pull("rlm/rag-prompt")

# Create LLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=1000,
    temperature=0.1,
    top_p=0.95,
    repetition_penalty=1.15,
    do_sample=True,
)
llm = HuggingFacePipeline(pipeline=pipe)

# Create RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm, 
    retriever=vectorstore.as_retriever(), 
    chain_type="stuff",  # Specify chain type
    chain_type_kwargs={"prompt": prompt}
)

# Ask question
question = "List the References"
response = qa_chain({"query": question})

print(response["result"])

Doesn't result in any content or some random texts at time, is there something missing in the pipeline?

notebook for the same.

Mar 22 '24 15:03 hemanth