Add/implement regression tests for MRC
We can leverage the scripts in tools/ to perform the MRC compression separately, and merge the final result, and create a diff of the output of the original image and the MRC compressed image. This way, if we have a database of images, we could improve the algorithm and see how it performs against known data/images.
Let's start with a list of items (or: pages from items) to test on (will edit as time goes on)
- https://archive.org/details/hebrewphrasebook0000wist
- https://archive.org/details/sim_english-illustrated-magazine_1884-12_2_15
- https://archive.org/details/commercialatlaso00lloy
- https://archive.org/details/lachartreusedepa0000unse_o2m4
- https://archive.org/details/newpaintingimpre0000unse_s6o0
- https://archive.org/details/recoveryofrhetor0000unse
- https://archive.org/details/sim_accent_1986-11_11_11
- https://archive.org/details/sim_achper-australia-healthy-lifestyles-journal_1976-12_74
- https://archive.org/details/sim_american-journal-of-occupational-therapy_1991_45_supplement
- https://archive.org/details/sim_r-f-design_1989-1990_12-13_cumulative-index
- https://archive.org/details/sixofcrows0000bard
- https://archive.org/details/zhongguoxuesheng0001unse
- https://archive.org/details/sim_journal-of-burn-care-research_1983-02_4_1
- https://archive.org/details/lanjingdeyanjing0000bing
- https://archive.org/details/cd_le-ceneri-di-heliodoro_rome
- https://archive.org/details/alienatemyhomewo0004swar
When testing on this nearly perfect image I still see a lot of things to improve to approach that quality. The jbig2-picture needs improvement. Not only the images should contain less fuzz, but the foreground-background separation needs work. The resulting background contains fuzz that shouldn't be there. So this is a good image to work on I think.