scantailor-advanced icon indicating copy to clipboard operation
scantailor-advanced copied to clipboard

Despeckle by area size

Open Piolie opened this issue 4 years ago • 3 comments

The current despeckle algorithm works well most of the time. However I have seen that it fails even for tiny particles if they are very near the rest of the content (for example, in between text lines). Rising the Despeckle level does not improve the result. On the contrary, the algorithm starts eating away the dots over the letters i or the full stops.

I think it would be nice to have the option to erase all black/white areas that have a pixel count bellow a settable threshold.

Currently this can be achieved by applying ImageMagick's connected-component labeling on the output of ScanTailor. The license is compatible with the GPL, so maybe it is easy to implement here.

Piolie avatar Aug 20 '19 22:08 Piolie

I don't need that implememtation, as ST already have the connected componets labeling implementation and uses it internally.

I'll just add an option to despeckle named threshold: all the components with size lower than the threshold value will be removed no matter where they are placed.

4lex4 avatar Aug 21 '19 05:08 4lex4

Not sure if the implementation would also allow for the following, which I would also find useful in this context:

Remove components thinner than a certain number of pixels. so e.g. a hair could be removed even if it produce a long structure and covers more pixels than a printed dot, as long as it is thinner than any printed line.

As I said, don't know if the maths for it is already implemented, but it could be done based on number of pixels within the structure per pixel on the edge (1 for a single-pixel line, 2 for 2-pixel lines, etc.), or on distance of "inner" pixels from the edge. Or maybe there's a smarter algorithm in either ImageMagick or ImageJ.

Mister-Teatime avatar Nov 24 '19 01:11 Mister-Teatime

I already use the algorithm you described in the noise reduction of the color segmentation for removing long thin components. Yes, I think of implementing the new option in this way.

4lex4 avatar Jan 23 '20 20:01 4lex4