bedrock-claude-chat icon indicating copy to clipboard operation
bedrock-claude-chat copied to clipboard

feat: AI-OCR for PDFs

Open fsatsuki opened this issue 1 year ago • 1 comments

Issue #, if available:

Using AI for PDF OCR

Description of changes:

PDFs contain various types of images, graphs, charts, designs, objects, etc., and these PDFs are easy for people to understand. However, typical OCR tools can't understand relationships between objects. Tabular tables are single-column strings, so it's difficult to infer table relationships from strings. The AI-OCR feature converts PDFs to images one by one and supports OCR using Claude3's multimodal features. As a result, structured markdown text can be retrieved and used as RAG knowledge.

The DB schema has changed. Add “Metadata” as JSON. In this PR, images converted from PDFs are stored in S3 buckets. It is placed in the source image URL and placed in the original pdfurl of the metadata.parentSource.

Screenshot 2024-07-18 at 14 04 46 Screenshot 2024-07-18 at 14 25 35 Screenshot 2024-07-18 at 14 26 02 stepfunctions_graph (2)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

fsatsuki avatar Jul 18 '24 05:07 fsatsuki

Memo: When the bedrock knowledge base retrieve api supports detail reference chunk, this PR could be a good reference.

statefb avatar Jul 19 '24 07:07 statefb