helix
helix copied to clipboard
better pdf to markdown
current pdf text extraction doesn't generate markdown and includes a lot of cruft
https://github.com/VikParuchuri/marker looks like it might do a better job, give it a try
in particular, two column layouts - which are common in academic papers - cause absolute mayhem and i'm surprised the model can make sense of it at all