archive-hocr-tools icon indicating copy to clipboard operation
archive-hocr-tools copied to clipboard

Make tools more usable with pipes

Open MerlijnWajer opened this issue 3 years ago • 0 comments

Many of the tools currently cannot work in special files in /dev/stdin in bash, or in general accept files from stdin, this is because of some unnecessary seeks.

Additionally, it would be nice to add some features to filter (for example) by word confidence. This could be done in hocr-text, but we could also have a streaming hocr filter tool that takes hocr as input, and also outputs hocr, but only allows words with certain confidence to pass. This would need to be streaming which makes it a little tricky, but it would be cool to for example pipe Tesseract output directly to such a tool.

MerlijnWajer avatar Sep 28 '21 18:09 MerlijnWajer