unshred
unshred copied to clipboard
Write a feature detector to find fragments of lines on shreds.
We need a feature detector that accepts shred and tries to determine fragment of lines on it. Proposed algorithm is:
- Remove pixels on a border of the shred to get rid of false positives.
- Apply adaptive binarisation to get rid of colour information.
- Detect lines using Hough transform or similar.
- Ignore lines which are too short or laying too close. That probably requires some adaptive algorithm and should take into account DPI information. Another fruitful idea might be filtering by histogram of angles. Basically, we are looking for lines to find fragments of the table, so we would expect that found lines falls into two buckets, those with angle of X (+/- Y degrees) and those with angle of X+90 (i.e perpendicular). Rest can probably be discarded.
- Return the list of lines (including angles!)
- Try to suggest some auto tags like: Has lines (easy one), has parallel lines, has perpendicular lines.
Thanks to @mr-const and @xa4a we have partial solution that needs some refinement.
For existing solution we are looking for:
- Improved accuracy
- Heuristics to suggest some tags
- Ideally: some way to evaluate algo using ground truth dataset.
Idea of building histogram for angles/lengths to filter out false negatives seems fruitful to me. Also, check PR comments I made the other day.
Is under development in #8