supermemory icon indicating copy to clipboard operation
supermemory copied to clipboard

Improve PDF handling: increase size limit and extract images from PDFs

Open karamvirsingh1998 opened this issue 2 months ago • 9 comments

@MaheshtheDev I've been testing supermemory and found two issues with PDF uploads:

Issue 1: 10MB limit is too small

  • Most research papers with images hit this limit
  • Users have to split files which breaks context

Issue 2: Images in PDFs are not processed

  • Charts, diagrams, and figures are ignored
  • Only text gets extracted
  • Important visual information is lost
  • Can't search for content that's in images

Example: I tried uploading a research paper with images - the images were completely ignored.

What I want to fix

I want to work on both of these issues:

  1. Increase PDF limit

  2. Extract and process images from PDFs

Why I'm the right person

  • I've identified exactly where the issues are
  • I understand the codebase structure
  • I have a clear solution in mind
  • Ready to write tests and docs

I want to take ownership of this issue and submit a PR.

Image Image

@MaheshtheDev Let me know if this sounds good!

karamvirsingh1998 avatar Nov 03 '25 17:11 karamvirsingh1998

ENG-365

linear[bot] avatar Nov 03 '25 17:11 linear[bot]

@karamvirsingh1998 how are you planning on extracting and processing the images in the PDF?

MaheshtheDev avatar Nov 03 '25 18:11 MaheshtheDev

Hey @MaheshtheDev I can work on the Issue 1, if there's a threshold size lemme know. Thanks

AntonVishal avatar Nov 04 '25 13:11 AntonVishal

Sure @MaheshtheDev Just to outline the high-level design, the current implementation relies purely on OCR-based extraction — which, while functional, willfail to capture layout semantics and visual hierarchy The improved architecture should follow a multi-stage hierarchical pipeline:

  1. Layout Detection & Structural Parsing: Use layout analysis to identify key regions
  2. Visual Context Encoding via Visual LLMs: Each detected region is passed through a Visual Language Model
  3. Text Semantic Chunking: The extracted textual regions are then semantically grouped
  4. Contextual Reconstruction Layer: Finally, merge both visual and textual embeddings to form a context-aware document representation

it will retained page-level context and hierarchy

karamvirsingh1998 avatar Nov 04 '25 15:11 karamvirsingh1998

Sure @MaheshtheDev Just to outline the high-level design, the current implementation relies purely on OCR-based extraction — which, while functional, willfail to capture layout semantics and visual hierarchy The improved architecture should follow a multi-stage hierarchical pipeline:

  1. Layout Detection & Structural Parsing: Use layout analysis to identify key regions
  2. Visual Context Encoding via Visual LLMs: Each detected region is passed through a Visual Language Model
  3. Text Semantic Chunking: The extracted textual regions are then semantically grouped
  4. Contextual Reconstruction Layer: Finally, merge both visual and textual embeddings to form a context-aware document representation

it will retained page-level context and hierarchy

Having all this done on client side or consumer app side doesn't seem ideal to me. Most probably this has to work with supermemory api itself. Will talk to team and get back to you soon

MaheshtheDev avatar Nov 05 '25 02:11 MaheshtheDev

Yes not all thingass will be on Client side , let me know next steps @MaheshtheDev Thanks

karamvirsingh1998 avatar Nov 05 '25 16:11 karamvirsingh1998

Hey @MaheshtheDev , I’ve explored similar PDF image extraction problems while experimenting with document processing pipelines. I’d love to take up this issue and work on enabling image processing for PDFs. Could you please assign this to me?

ParagGhatage avatar Nov 12 '25 14:11 ParagGhatage

Hey @MaheshtheDev , I’ve explored similar PDF image extraction problems while experimenting with document processing pipelines. I’d love to take up this issue and work on enabling image processing for PDFs. Could you please assign this to me?

thanks for exploring. however this issue deals with supermemory api related changes on the image processing with in the PDF.

MaheshtheDev avatar Nov 14 '25 20:11 MaheshtheDev

Hey @MaheshtheDev , I’ve explored similar PDF image extraction problems while experimenting with document processing pipelines. I’d love to take up this issue and work on enabling image processing for PDFs. Could you please assign this to me?

thanks for exploring. however this issue deals with supermemory api related changes on the image processing with in the PDF.

thanks for the clarification @MaheshtheDev , I’m comfortable working on the Supermemory API side as well and would still like to take this issue. Let me know the constraints or direction, and I’ll proceed.

ParagGhatage avatar Nov 14 '25 20:11 ParagGhatage

Hey @MaheshtheDev would love to work on image processing with in the PDF can i propose my solution ?

Elon7069 avatar Dec 03 '25 20:12 Elon7069