rmast

Results 184 comments of rmast

> The folders have some residual files. `ScanTailor` itself can now split tiffs, though I have no idea how to merge them as layers in a PDF. (That would be...

I saw JPEG2000 also has a composite JPM format, meant for MRC. I don't know if that has more possibilities than PDF already has, but as JPEG2000 is part of...

So one of the issues with background pictures containing fuzz behind the foreground is not possible with this reserveBlackAndWhite output. I don't think the surrounding pixels of that reservedBlackAndWhite are...

I don’t think a 1:1 mask generated from a binarized picture would reveal regions of interest. Most of the page is just fuzzy black or fuzzy white. A page with...

When I think of a way to get the PostNL-bill compressed that I used before as a test-subject I could imagine to use the high-density part for the square ocr_photo-frame...

The grey ABN AMRO-text on top of the ABN-AMRO-letter is recognized by tesseract as text size 75 in a bounding box. The shield-logo to the left appears to be recognized...

So all text will be masked by JBIG2 colored by a low res mask coloring picture and photo-elements will get ROI attention on the background picture. Usually that means text...

I experimented with didjvu via c44 a while ago. https://github.com/jwilk/didjvu/issues/19 With subsample ratios 3 to 5 for the background picture c44 was able to almost clear the background (I guess...

I think I also have to propagate some findings on didjvu, probably coming from djvumake or c44. I compared background-pictures coming from both via didjvu, but c44 seems to blow...

Have you seen the [OCRD-project](https://github.com/OCR-D) that contains lots of binarizations? [Ocr4all](https://github.com/OCR4all) tries to use it in an upcoming edition. Gamera 4 is also providing some binarization algorithms, for example an...