archive-pdf-tools Add/implement regression tests for MRC

We can leverage the scripts in tools/ to perform the MRC compression separately, and merge the final result, and create a diff of the output of the original image and the MRC compressed image. This way, if we have a database of images, we could improve the algorithm and see how it performs against known data/images.

Sep 18 '21 11:09 MerlijnWajer

Let's start with a list of items (or: pages from items) to test on (will edit as time goes on)

https://archive.org/details/hebrewphrasebook0000wist
https://archive.org/details/sim_english-illustrated-magazine_1884-12_2_15
https://archive.org/details/commercialatlaso00lloy
https://archive.org/details/lachartreusedepa0000unse_o2m4
https://archive.org/details/newpaintingimpre0000unse_s6o0
https://archive.org/details/recoveryofrhetor0000unse
https://archive.org/details/sim_accent_1986-11_11_11
https://archive.org/details/sim_achper-australia-healthy-lifestyles-journal_1976-12_74
https://archive.org/details/sim_american-journal-of-occupational-therapy_1991_45_supplement
https://archive.org/details/sim_r-f-design_1989-1990_12-13_cumulative-index
https://archive.org/details/sixofcrows0000bard
https://archive.org/details/zhongguoxuesheng0001unse
https://archive.org/details/sim_journal-of-burn-care-research_1983-02_4_1
https://archive.org/details/lanjingdeyanjing0000bing
https://archive.org/details/cd_le-ceneri-di-heliodoro_rome
https://archive.org/details/alienatemyhomewo0004swar

Sep 23 '21 15:09 MerlijnWajer

When testing on this nearly perfect image I still see a lot of things to improve to approach that quality. The jbig2-picture needs improvement. Not only the images should contain less fuzz, but the foreground-background separation needs work. The resulting background contains fuzz that shouldn't be there. So this is a good image to work on I think.

Jun 29 '22 15:06 rmast