helix icon indicating copy to clipboard operation
helix copied to clipboard

better pdf to markdown

Open lukemarsden opened this issue 1 year ago • 1 comments

current pdf text extraction doesn't generate markdown and includes a lot of cruft

https://github.com/VikParuchuri/marker looks like it might do a better job, give it a try

lukemarsden avatar Feb 05 '24 13:02 lukemarsden

in particular, two column layouts - which are common in academic papers - cause absolute mayhem and i'm surprised the model can make sense of it at all

lukemarsden avatar Feb 05 '24 13:02 lukemarsden