pdf-to-markdown icon indicating copy to clipboard operation
pdf-to-markdown copied to clipboard

Improve parsing on dict.pdf

Open jzillmann opened this issue 3 years ago • 1 comments

The pdf parsing of https://homepages.cwi.nl/~lex/files/dict.pdf doesn't look very appealing.

Thinks i already noticed

  • No TOC display (and strange header size detection, see #21)
  • Characters are clutched together which shouldn't be
  • Page numbers are not detected and removed

jzillmann avatar Feb 26 '21 11:02 jzillmann

This was addressed in the PRs that close opendocsg/pdf2md#10

LoneRifle avatar Feb 26 '21 11:02 LoneRifle