langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Attribute error tuple has no attribute 'page_content'

Open vmunagal opened this issue 2 years ago • 8 comments

chain = load_qa_chain(llm, chain_type="stuff") answer = chain.run(input_documents=similar_docs ,question=query) --> This is return a attribute error as below

8 frames /usr/local/lib/python3.10/dist-packages/langchain/chains/combine_documents/base.py in format_document(doc, prompt) 14 def format_document(doc: Document, prompt: BasePromptTemplate) -> str: 15 """Format a document into a string based on a prompt template.""" ---> 16 base_info = {"page_content": doc.page_content} 17 base_info.update(doc.metadata) 18 missing_metadata = set(prompt.input_variables).difference(base_info)

AttributeError: 'tuple' object has no attribute 'page_content'

vmunagal avatar Apr 29 '23 15:04 vmunagal

Please provide working code that can reproduce this issue.

PawelFaron avatar May 04 '23 13:05 PawelFaron

Experienceing the same thing

OlajideOgun avatar May 06 '23 17:05 OlajideOgun

doc is a tuple and not Document type

OlajideOgun avatar May 06 '23 17:05 OlajideOgun

work around that works for me for now is
doc = doc[0]

    """Format a document into a string based on a prompt template."""
   # change doc to be doc[0]
    doc = doc[0]

    base_info = {"page_content": doc.page_content}
    base_info.update(doc.metadata)
    missing_metadata = set(prompt.input_variables).difference(base_info)
    if len(missing_metadata) > 0:
        required_metadata = [
            iv for iv in prompt.input_variables if iv != "page_content"
        ]
        raise ValueError(
            f"Document prompt requires documents to have metadata variables: "
            f"{required_metadata}. Received document with missing metadata: "
            f"{list(missing_metadata)}."
        )
    document_info = {k: base_info[k] for k in prompt.input_variables}
    return prompt.format(**document_info)```

OlajideOgun avatar May 06 '23 22:05 OlajideOgun

The issue happened when I tried to use similarity_search_with_score with chain.run: docs_and_scores = db.similarity_search_with_score("what are microgreens?", k=4) llm = ChatOpenAI() chain = load_qa_chain(llm=llm, chain_type="stuff") print(chain.run(input_documents=docs, question="what are microgreens?"))

So I have to modify my code like this to extract doc from docs_and_scores: docs = [item[0] for item in docs_and_scores]

AliAkhtari78 avatar May 07 '23 08:05 AliAkhtari78

Yeah thats correct issue got fixed if I remove the score

vmunagal avatar May 07 '23 19:05 vmunagal

the error can be reproduced by just running the example and trying to the text splitter https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/copypaste.html

from langchain.docstore.document import Document
text = "..... put the text you copy pasted here......"
doc = Document(page_content=text)
metadata = {"source": "internet", "date": "Friday"}
doc = Document(page_content=text, metadata=metadata)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1, chunk_overlap=0)
texts = text_splitter.split_documents(doc)

AttributeError Traceback (most recent call last) in 1 text_splitter = RecursiveCharacterTextSplitter(chunk_size=1, chunk_overlap=0) ----> 2 texts = text_splitter.split_documents(doc)


/usr/local/anaconda3/lib/python3.8/site-packages/langchain/text_splitter.py in split_documents(self, documents)
     63     def split_documents(self, documents: List[Document]) -> List[Document]:
     64         """Split documents."""
---> 65         texts = [doc.page_content for doc in documents]
     66         metadatas = [doc.metadata for doc in documents]
     67         return self.create_documents(texts, metadatas=metadatas)

/usr/local/anaconda3/lib/python3.8/site-packages/langchain/text_splitter.py in <listcomp>(.0)
     63     def split_documents(self, documents: List[Document]) -> List[Document]:
     64         """Split documents."""
---> 65         texts = [doc.page_content for doc in documents]
     66         metadatas = [doc.metadata for doc in documents]
     67         return self.create_documents(texts, metadatas=metadatas)

AttributeError: 'tuple' object has no attribute 'page_content'

the error is not true because there is page _ content when you print doc

Document(page_content='..... put the text you copy pasted here......', metadata={'source': 'internet', 'date': 'Friday'})

i am still experiencing the same issue and doc[0] produce another error that TypeError: 'Document' object is not subscriptable

lindseypeng avatar May 11 '23 21:05 lindseypeng

Try the following code:

from langchain.llms import OpenAI
import environ
env = environ.Env()
environ.Env.read_env()
API_KEY = env('OPENAI_API_KEY')

from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.docstore.document import Document
import textwrap

llm = OpenAI(model_name="text-davinci-003", openai_api_key=API_KEY)

text_splitter = CharacterTextSplitter()
with open("data.txt") as f:
    data = f.read()
texts = text_splitter.split_text(data)

docs = [Document(page_content=t) for t in texts[:3]]

chain = load_summarize_chain(llm, chain_type="map_reduce")
output_summary = chain.run(docs)

wrapped_text = textwrap.fill(output_summary, width=120)
print(wrapped_text)

codemaker2015 avatar May 22 '23 22:05 codemaker2015

I have another work around can update the method like this

    def from_documents(
        cls: Type[VST],
        documents: List[Document],
        embedding: Embeddings,
        **kwargs: Any,
    ) -> VST:
        """Return VectorStore initialized from documents and embeddings."""
        metadatas = []
        texts = []
        for d in documents:
            if d[0] == 'metadata':
                metadatas.append(d[1])
            elif d[0] == 'page_content':
                texts.append(d[1])
            

OlajideOgun avatar May 29 '23 06:05 OlajideOgun

Experiencing the same issue when

original_document = Document(page_content=message, metadata={})
split_documents = self.text_splitter.split_documents(original_document)
summary = self.summarize_chain(split_documents)

flynnoct avatar May 30 '23 15:05 flynnoct

@lindseypeng Did you find solution to the error?

Atharva1763 avatar Jun 10 '23 18:06 Atharva1763

Experiencing the same issue when

original_document = Document(page_content=message, metadata={})
split_documents = self.text_splitter.split_documents(original_document)
summary = self.summarize_chain(split_documents)

@flynnoct I believe split_documents takes in a list of documents so your problem may be fixed using self.text_splitter.split_documents([original_document])

liowalex avatar Jul 30 '23 02:07 liowalex

Hi, @vmunagal! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking the issue titled "Attribute error tuple has no attribute 'page_content'" as stale.

Based on my understanding, this issue is causing an attribute error when trying to format a document into a string based on a prompt template. PawelFaron requested you to provide working code that can reproduce the issue, and OlajideOgun and AliAkhtari78 also experienced the same issue and provided workarounds. lindseypeng shared an example that reproduces the error.

Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

dosubot[bot] avatar Oct 29 '23 16:10 dosubot[bot]