pdf-to-markdown
pdf-to-markdown copied to clipboard
Improve parsing on dict.pdf
The pdf parsing of https://homepages.cwi.nl/~lex/files/dict.pdf doesn't look very appealing.
Thinks i already noticed
- No TOC display (and strange header size detection, see #21)
- Characters are clutched together which shouldn't be
- Page numbers are not detected and removed
This was addressed in the PRs that close opendocsg/pdf2md#10