camelot icon indicating copy to clipboard operation
camelot copied to clipboard

How to combine tabular and non-tabular content from a PDF?

Open tpanza opened this issue 1 year ago • 1 comments

Thanks for a great tool. I haven't seen this addressed anywhere, so I'll ask it here.

I have some large PDFs that consist of tables and some "regular" text. What I'd like to do is convert the PDF to a single HTML (or Markdown) file that does a simple text extract for the non-tabular parts, but then uses Camelot for the tabular parts, while keeping the overall order of the document intact.

Basically, keep all of the content in order, but with the tabular data appropriately formatted in HTML/Markdown. For my situation, I want to keep the surrounding context before and after the tables.

Is there a way to do this? If not, might someone point me to where in the Camelot code would be a good place to insert such a patch?

tpanza avatar May 22 '24 03:05 tpanza

I'd love this too! I haven't found a free tool that does this yet.

bulrush15 avatar Sep 19 '24 13:09 bulrush15