pdfalto
pdfalto copied to clipboard
support discarding diagonal text like pdftotext(xpdf version)
Normally diagonal text is useless for grobid training.
Indeed it is often good to discard diagonal texts for skiping watermarks. However if the ROTATION attribute is outputted (issue #109), it could then be up to the user to decide to use the information or not, given that the degree is available (e.g. ignore elements when degree is not 0, 90, 180, 270).