phidata
phidata copied to clipboard
PDFImageReader not working with PDFKnowledgeBase
Running the following code, gives this error:
from phi.assistant import Assistant
from phi.document.reader.pdf import PDFImageReader
from phi.knowledge.pdf import PDFKnowledgeBase
from phi.vectordb.lancedb.lancedb import LanceDb
# type: ignore
db_url = "/tmp/lancedb" # Optional
# Create a knowledge base with the PDFs from the data/pdfs directory
knowledge_base = PDFKnowledgeBase(
path="data/pdfs",
vector_db=LanceDb(uri=db_url),
reader=PDFImageReader(chunk=True),
)
# Load the knowledge base
knowledge_base.load(recreate=False)
# Create an assistant with the knowledge base
assistant = Assistant(
knowledge_base=knowledge_base,
add_references_to_prompt=True,
)
# Ask the assistant about the knowledge base
assistant.print_response("Summarize this document.", markdown=True)
Error -
INFO Creating table: phi
Traceback (most recent call last):
File "/Users/siyer/PycharmProjects/report-call-summarizer/localdb-lancedb-knowledgebase.py", line 10, in <module>
knowledge_base = PDFKnowledgeBase(
^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/report-call-summarizer/lib/python3.11/site-packages/pydantic/main.py", line 164, in __init__
__pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 1 validation error for PDFKnowledgeBase
reader
Input should be a valid dictionary or instance of PDFReader [type=model_type, input_value=PDFImageReader(chunk=True...\n\r', '\t', ' ', ' ']), input_type=PDFImageReader]
For further information visit https://errors.pydantic.dev/2.5/v/model_type
Process finished with exit code 1
this looks like a 1 line change in pdf.py; reader should of type "reader"
https://github.com/phidatahq/phidata/blob/9b2653f2c5ff77c4babf44ac324f5568ee69f856/phi/knowledge/pdf.py#L11
class PDFKnowledgeBase(AssistantKnowledge):
path: Union[str, Path]
reader: Reader = PDFReader()
cool
Hi Team. Do we have an update on when we can get a new release with this change?
@sridharaiyer PR will be out shortly and most likely we will be releasing a new version by EOD
The PR is out @sridharaiyer. You are welcome to test
The PR is out @sridharaiyer. You are welcome to test
Tested. Works fine for my use case, thanks a lot!
Merged