langchain
langchain copied to clipboard
Attribute error tuple has no attribute 'page_content'
chain = load_qa_chain(llm, chain_type="stuff") answer = chain.run(input_documents=similar_docs ,question=query) --> This is return a attribute error as below
8 frames /usr/local/lib/python3.10/dist-packages/langchain/chains/combine_documents/base.py in format_document(doc, prompt) 14 def format_document(doc: Document, prompt: BasePromptTemplate) -> str: 15 """Format a document into a string based on a prompt template.""" ---> 16 base_info = {"page_content": doc.page_content} 17 base_info.update(doc.metadata) 18 missing_metadata = set(prompt.input_variables).difference(base_info)
AttributeError: 'tuple' object has no attribute 'page_content'
Please provide working code that can reproduce this issue.
Experienceing the same thing
doc is a tuple and not Document type
work around that works for me for now is
doc = doc[0]
"""Format a document into a string based on a prompt template."""
# change doc to be doc[0]
doc = doc[0]
base_info = {"page_content": doc.page_content}
base_info.update(doc.metadata)
missing_metadata = set(prompt.input_variables).difference(base_info)
if len(missing_metadata) > 0:
required_metadata = [
iv for iv in prompt.input_variables if iv != "page_content"
]
raise ValueError(
f"Document prompt requires documents to have metadata variables: "
f"{required_metadata}. Received document with missing metadata: "
f"{list(missing_metadata)}."
)
document_info = {k: base_info[k] for k in prompt.input_variables}
return prompt.format(**document_info)```
The issue happened when I tried to use similarity_search_with_score with chain.run:
docs_and_scores = db.similarity_search_with_score("what are microgreens?", k=4)
llm = ChatOpenAI()
chain = load_qa_chain(llm=llm, chain_type="stuff")
print(chain.run(input_documents=docs, question="what are microgreens?"))
So I have to modify my code like this to extract doc from docs_and_scores:
docs = [item[0] for item in docs_and_scores]
Yeah thats correct issue got fixed if I remove the score
the error can be reproduced by just running the example and trying to the text splitter https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/copypaste.html
from langchain.docstore.document import Document
text = "..... put the text you copy pasted here......"
doc = Document(page_content=text)
metadata = {"source": "internet", "date": "Friday"}
doc = Document(page_content=text, metadata=metadata)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1, chunk_overlap=0)
texts = text_splitter.split_documents(doc)
AttributeError Traceback (most recent call last)
/usr/local/anaconda3/lib/python3.8/site-packages/langchain/text_splitter.py in split_documents(self, documents)
63 def split_documents(self, documents: List[Document]) -> List[Document]:
64 """Split documents."""
---> 65 texts = [doc.page_content for doc in documents]
66 metadatas = [doc.metadata for doc in documents]
67 return self.create_documents(texts, metadatas=metadatas)
/usr/local/anaconda3/lib/python3.8/site-packages/langchain/text_splitter.py in <listcomp>(.0)
63 def split_documents(self, documents: List[Document]) -> List[Document]:
64 """Split documents."""
---> 65 texts = [doc.page_content for doc in documents]
66 metadatas = [doc.metadata for doc in documents]
67 return self.create_documents(texts, metadatas=metadatas)
AttributeError: 'tuple' object has no attribute 'page_content'
the error is not true because there is page _ content when you print doc
Document(page_content='..... put the text you copy pasted here......', metadata={'source': 'internet', 'date': 'Friday'})
i am still experiencing the same issue and doc[0] produce another error that TypeError: 'Document' object is not subscriptable
Try the following code:
from langchain.llms import OpenAI
import environ
env = environ.Env()
environ.Env.read_env()
API_KEY = env('OPENAI_API_KEY')
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.docstore.document import Document
import textwrap
llm = OpenAI(model_name="text-davinci-003", openai_api_key=API_KEY)
text_splitter = CharacterTextSplitter()
with open("data.txt") as f:
data = f.read()
texts = text_splitter.split_text(data)
docs = [Document(page_content=t) for t in texts[:3]]
chain = load_summarize_chain(llm, chain_type="map_reduce")
output_summary = chain.run(docs)
wrapped_text = textwrap.fill(output_summary, width=120)
print(wrapped_text)
I have another work around can update the method like this
def from_documents(
cls: Type[VST],
documents: List[Document],
embedding: Embeddings,
**kwargs: Any,
) -> VST:
"""Return VectorStore initialized from documents and embeddings."""
metadatas = []
texts = []
for d in documents:
if d[0] == 'metadata':
metadatas.append(d[1])
elif d[0] == 'page_content':
texts.append(d[1])
Experiencing the same issue when
original_document = Document(page_content=message, metadata={})
split_documents = self.text_splitter.split_documents(original_document)
summary = self.summarize_chain(split_documents)
@lindseypeng Did you find solution to the error?
Experiencing the same issue when
original_document = Document(page_content=message, metadata={}) split_documents = self.text_splitter.split_documents(original_document) summary = self.summarize_chain(split_documents)
@flynnoct I believe split_documents takes in a list of documents so your problem may be fixed using
self.text_splitter.split_documents([original_document])
Hi, @vmunagal! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking the issue titled "Attribute error tuple has no attribute 'page_content'" as stale.
Based on my understanding, this issue is causing an attribute error when trying to format a document into a string based on a prompt template. PawelFaron requested you to provide working code that can reproduce the issue, and OlajideOgun and AliAkhtari78 also experienced the same issue and provided workarounds. lindseypeng shared an example that reproduces the error.
Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project!