ocr-fileformat
ocr-fileformat copied to clipboard
page to text: rewrite
- supports recursive ReadingOrder (can be disabled via param order=document)
- supports setting the hierarchy level to extract from (default level=highest behaves as before)
- supports setting line/paragraph boundary strings (params lb and pb)