Improve PDF handling: increase size limit and extract images from PDFs
@MaheshtheDev I've been testing supermemory and found two issues with PDF uploads:
Issue 1: 10MB limit is too small
- Most research papers with images hit this limit
- Users have to split files which breaks context
Issue 2: Images in PDFs are not processed
- Charts, diagrams, and figures are ignored
- Only text gets extracted
- Important visual information is lost
- Can't search for content that's in images
Example: I tried uploading a research paper with images - the images were completely ignored.
What I want to fix
I want to work on both of these issues:
-
Increase PDF limit
-
Extract and process images from PDFs
Why I'm the right person
- I've identified exactly where the issues are
- I understand the codebase structure
- I have a clear solution in mind
- Ready to write tests and docs
I want to take ownership of this issue and submit a PR.
@MaheshtheDev Let me know if this sounds good!
@karamvirsingh1998 how are you planning on extracting and processing the images in the PDF?
Hey @MaheshtheDev I can work on the Issue 1, if there's a threshold size lemme know. Thanks
Sure @MaheshtheDev Just to outline the high-level design, the current implementation relies purely on OCR-based extraction — which, while functional, willfail to capture layout semantics and visual hierarchy The improved architecture should follow a multi-stage hierarchical pipeline:
- Layout Detection & Structural Parsing: Use layout analysis to identify key regions
- Visual Context Encoding via Visual LLMs: Each detected region is passed through a Visual Language Model
- Text Semantic Chunking: The extracted textual regions are then semantically grouped
- Contextual Reconstruction Layer: Finally, merge both visual and textual embeddings to form a context-aware document representation
it will retained page-level context and hierarchy
Sure @MaheshtheDev Just to outline the high-level design, the current implementation relies purely on OCR-based extraction — which, while functional, willfail to capture layout semantics and visual hierarchy The improved architecture should follow a multi-stage hierarchical pipeline:
- Layout Detection & Structural Parsing: Use layout analysis to identify key regions
- Visual Context Encoding via Visual LLMs: Each detected region is passed through a Visual Language Model
- Text Semantic Chunking: The extracted textual regions are then semantically grouped
- Contextual Reconstruction Layer: Finally, merge both visual and textual embeddings to form a context-aware document representation
it will retained page-level context and hierarchy
Having all this done on client side or consumer app side doesn't seem ideal to me. Most probably this has to work with supermemory api itself. Will talk to team and get back to you soon
Yes not all thingass will be on Client side , let me know next steps @MaheshtheDev Thanks
Hey @MaheshtheDev , I’ve explored similar PDF image extraction problems while experimenting with document processing pipelines. I’d love to take up this issue and work on enabling image processing for PDFs. Could you please assign this to me?
Hey @MaheshtheDev , I’ve explored similar PDF image extraction problems while experimenting with document processing pipelines. I’d love to take up this issue and work on enabling image processing for PDFs. Could you please assign this to me?
thanks for exploring. however this issue deals with supermemory api related changes on the image processing with in the PDF.
Hey @MaheshtheDev , I’ve explored similar PDF image extraction problems while experimenting with document processing pipelines. I’d love to take up this issue and work on enabling image processing for PDFs. Could you please assign this to me?
thanks for exploring. however this issue deals with supermemory api related changes on the image processing with in the PDF.
thanks for the clarification @MaheshtheDev , I’m comfortable working on the Supermemory API side as well and would still like to take this issue. Let me know the constraints or direction, and I’ll proceed.
Hey @MaheshtheDev would love to work on image processing with in the PDF can i propose my solution ?