langchain Attribute error tuple has no attribute 'page

chain = load_qa_chain(llm, chain_type="stuff") answer = chain.run(input_documents=similar_docs ,question=query) --> This is return a attribute error as below

8 frames /usr/local/lib/python3.10/dist-packages/langchain/chains/combine_documents/base.py in format_document(doc, prompt) 14 def format_document(doc: Document, prompt: BasePromptTemplate) -> str: 15 """Format a document into a string based on a prompt template.""" ---> 16 base_info = {"page_content": doc.page_content} 17 base_info.update(doc.metadata) 18 missing_metadata = set(prompt.input_variables).difference(base_info)

AttributeError: 'tuple' object has no attribute 'page_content'

Apr 29 '23 15:04 vmunagal

Please provide working code that can reproduce this issue.

May 04 '23 13:05 PawelFaron

Experienceing the same thing

May 06 '23 17:05 OlajideOgun

doc is a tuple and not Document type

May 06 '23 17:05 OlajideOgun

work around that works for me for now is
doc = doc[0]

    """Format a document into a string based on a prompt template."""
   # change doc to be doc[0]
    doc = doc[0]

    base_info = {"page_content": doc.page_content}
    base_info.update(doc.metadata)
    missing_metadata = set(prompt.input_variables).difference(base_info)
    if len(missing_metadata) > 0:
        required_metadata = [
            iv for iv in prompt.input_variables if iv != "page_content"
        ]
        raise ValueError(
            f"Document prompt requires documents to have metadata variables: "
            f"{required_metadata}. Received document with missing metadata: "
            f"{list(missing_metadata)}."
        )
    document_info = {k: base_info[k] for k in prompt.input_variables}
    return prompt.format(**document_info)```

May 06 '23 22:05 OlajideOgun

The issue happened when I tried to use similarity_search_with_score with chain.run: docs_and_scores = db.similarity_search_with_score("what are microgreens?", k=4) llm = ChatOpenAI() chain = load_qa_chain(llm=llm, chain_type="stuff") print(chain.run(input_documents=docs, question="what are microgreens?"))

So I have to modify my code like this to extract doc from docs_and_scores: docs = [item[0] for item in docs_and_scores]

May 07 '23 08:05 AliAkhtari78

Yeah thats correct issue got fixed if I remove the score

May 07 '23 19:05 vmunagal

the error can be reproduced by just running the example and trying to the text splitter https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/copypaste.html

from langchain.docstore.document import Document
text = "..... put the text you copy pasted here......"
doc = Document(page_content=text)
metadata = {"source": "internet", "date": "Friday"}
doc = Document(page_content=text, metadata=metadata)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1, chunk_overlap=0)
texts = text_splitter.split_documents(doc)

AttributeError Traceback (most recent call last) in 1 text_splitter = RecursiveCharacterTextSplitter(chunk_size=1, chunk_overlap=0) ----> 2 texts = text_splitter.split_documents(doc)


/usr/local/anaconda3/lib/python3.8/site-packages/langchain/text_splitter.py in split_documents(self, documents)
     63     def split_documents(self, documents: List[Document]) -> List[Document]:
     64         """Split documents."""
---> 65         texts = [doc.page_content for doc in documents]
     66         metadatas = [doc.metadata for doc in documents]
     67         return self.create_documents(texts, metadatas=metadatas)

/usr/local/anaconda3/lib/python3.8/site-packages/langchain/text_splitter.py in <listcomp>(.0)
     63     def split_documents(self, documents: List[Document]) -> List[Document]:
     64         """Split documents."""
---> 65         texts = [doc.page_content for doc in documents]
     66         metadatas = [doc.metadata for doc in documents]
     67         return self.create_documents(texts, metadatas=metadatas)

AttributeError: 'tuple' object has no attribute 'page_content'

the error is not true because there is page _ content when you print doc

Document(page_content='..... put the text you copy pasted here......', metadata={'source': 'internet', 'date': 'Friday'})

i am still experiencing the same issue and doc[0] produce another error that TypeError: 'Document' object is not subscriptable

May 11 '23 21:05 lindseypeng

Try the following code:

from langchain.llms import OpenAI
import environ
env = environ.Env()
environ.Env.read_env()
API_KEY = env('OPENAI_API_KEY')

from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.docstore.document import Document
import textwrap

llm = OpenAI(model_name="text-davinci-003", openai_api_key=API_KEY)

text_splitter = CharacterTextSplitter()
with open("data.txt") as f:
    data = f.read()
texts = text_splitter.split_text(data)

docs = [Document(page_content=t) for t in texts[:3]]

chain = load_summarize_chain(llm, chain_type="map_reduce")
output_summary = chain.run(docs)

wrapped_text = textwrap.fill(output_summary, width=120)
print(wrapped_text)

May 22 '23 22:05 codemaker2015

I have another work around can update the method like this

    def from_documents(
        cls: Type[VST],
        documents: List[Document],
        embedding: Embeddings,
        **kwargs: Any,
    ) -> VST:
        """Return VectorStore initialized from documents and embeddings."""
        metadatas = []
        texts = []
        for d in documents:
            if d[0] == 'metadata':
                metadatas.append(d[1])
            elif d[0] == 'page_content':
                texts.append(d[1])

May 29 '23 06:05 OlajideOgun

Experiencing the same issue when

original_document = Document(page_content=message, metadata={})
split_documents = self.text_splitter.split_documents(original_document)
summary = self.summarize_chain(split_documents)

May 30 '23 15:05 flynnoct

@lindseypeng Did you find solution to the error?

Jun 10 '23 18:06 Atharva1763

Experiencing the same issue when

original_document = Document(page_content=message, metadata={})
split_documents = self.text_splitter.split_documents(original_document)
summary = self.summarize_chain(split_documents)

@flynnoct I believe split_documents takes in a list of documents so your problem may be fixed using self.text_splitter.split_documents([original_document])

Jul 30 '23 02:07 liowalex

Hi, @vmunagal! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking the issue titled "Attribute error tuple has no attribute 'page_content'" as stale.

Based on my understanding, this issue is causing an attribute error when trying to format a document into a string based on a prompt template. PawelFaron requested you to provide working code that can reproduce the issue, and OlajideOgun and AliAkhtari78 also experienced the same issue and provided workarounds. lindseypeng shared an example that reproduces the error.

Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

Oct 29 '23 16:10 dosubot[bot]

langchain langchain copied to clipboard

Attribute error tuple has no attribute 'page_content'

langchain
langchain copied to clipboard