langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Using Dolly-v2-7b with langchain getting error: The size of tensor a (2048) must match the size of tensor b (2049) at non-singleton dimension 3

Open zubair-ahmed-ai opened this issue 1 year ago • 0 comments

System Info

Langchain 0.0.171, Python 3.9.0, OS Ubuntu 20.04.6 LTS

Hi @hwchase17 @agola11

Using dolly-v2-7b model with Langchain, I am running into this issue my question is how to chain the input properly so that chunk from the first chain is fed into the next one, assuming that's the right way to avoid repetition instead of the whole generation, previously with dolly-v2-3b it resulted in repeating the same generation 3-4 times.

I am using the following code to generate sample NDAs after feeding it a FAISS vector store embeddings that were generated using InstructorEmbedding (not OpenAI) using instructor-xl but I am getting this error: The size of tensor a (2048) must match the size of tensor b (2049) at non-singleton dimension 3

prompt_template = """Use the context below to write a detailed 5000 words NDA between the two persons:
    Context: {context}
    Topic: {topic}
    Disclosing Party: {disclosingparty}
    Receiving Party: {receivingparty}
    NDA:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "topic", "disclosingparty", "receivingparty"]
)


llm = HuggingFacePipeline.from_model_id(model_id="dolly-v2-7b", 
task="text-generation", model_kwargs={"temperature":0, "max_length":5000})

chain = LLMChain(llm=llm, prompt=PROMPT, output_key="nda_1")

prompt_template = """Using the NDA generated above within the same context as before, continue writing the nda:
    Context: {nda_1}        
    NDA:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["nda_1"]
)
continue_chain = LLMChain(llm=llm, prompt=PROMPT, output_key="nda_2")
overall_chain = SequentialChain(chains=[chain, continue_chain], 
input_variables=['context', 'topic', 'disclosingparty', 'receivingparty'], 
output_variables=["nda_1", "nda_2"], verbose=True)

def generate_text(topic, disclosingparty, receivingparty):
    docs = db_instructEmbedd.similarity_search(topic, k=4)
    inputs = [{"context": doc.page_content, "topic": topic, "disclosingparty": disclosingparty, "receivingparty" : receivingparty} for doc in docs]
    return overall_chain.apply(inputs)

response = generate_text("Based on this acquired knowledge, write a detailed NDA in 5000 words or less between these two parties on date May 15, 2023 governing rules of <country>, dont be repetitive and include all the required clauses to make it comprehensive contract", "Mr. X", "Mr. Y")
print(response)

Who can help?

No response

Information

  • [ ] The official example notebooks/scripts
  • [ ] My own modified scripts

Related Components

  • [X] LLMs/Chat Models
  • [X] Embedding Models
  • [X] Prompts / Prompt Templates / Prompt Selectors
  • [ ] Output Parsers
  • [ ] Document Loaders
  • [X] Vector Stores / Retrievers
  • [X] Memory
  • [ ] Agents / Agent Executors
  • [ ] Tools / Toolkits
  • [X] Chains
  • [ ] Callbacks/Tracing
  • [ ] Async

Reproduction

Running a Instructor-xl to generate embedding, using sequentialchains to generate

Expected behavior

It should generate a long form text

zubair-ahmed-ai avatar May 17 '23 16:05 zubair-ahmed-ai