pdfalto icon indicating copy to clipboard operation
pdfalto copied to clipboard

support discarding diagonal text like pdftotext(xpdf version)

Open elonzh opened this issue 2 years ago • 1 comments

Normally diagonal text is useless for grobid training.

elonzh avatar Apr 14 '22 09:04 elonzh

Indeed it is often good to discard diagonal texts for skiping watermarks. However if the ROTATION attribute is outputted (issue #109), it could then be up to the user to decide to use the information or not, given that the degree is available (e.g. ignore elements when degree is not 0, 90, 180, 270).

kermitt2 avatar Apr 15 '22 13:04 kermitt2