archive-pdf-tools icon indicating copy to clipboard operation
archive-pdf-tools copied to clipboard

Add/implement regression tests for MRC

Open MerlijnWajer opened this issue 4 years ago • 2 comments

We can leverage the scripts in tools/ to perform the MRC compression separately, and merge the final result, and create a diff of the output of the original image and the MRC compressed image. This way, if we have a database of images, we could improve the algorithm and see how it performs against known data/images.

MerlijnWajer avatar Sep 18 '21 11:09 MerlijnWajer

Let's start with a list of items (or: pages from items) to test on (will edit as time goes on)

  1. https://archive.org/details/hebrewphrasebook0000wist
  2. https://archive.org/details/sim_english-illustrated-magazine_1884-12_2_15
  3. https://archive.org/details/commercialatlaso00lloy
  4. https://archive.org/details/lachartreusedepa0000unse_o2m4
  5. https://archive.org/details/newpaintingimpre0000unse_s6o0
  6. https://archive.org/details/recoveryofrhetor0000unse
  7. https://archive.org/details/sim_accent_1986-11_11_11
  8. https://archive.org/details/sim_achper-australia-healthy-lifestyles-journal_1976-12_74
  9. https://archive.org/details/sim_american-journal-of-occupational-therapy_1991_45_supplement
  10. https://archive.org/details/sim_r-f-design_1989-1990_12-13_cumulative-index
  11. https://archive.org/details/sixofcrows0000bard
  12. https://archive.org/details/zhongguoxuesheng0001unse
  13. https://archive.org/details/sim_journal-of-burn-care-research_1983-02_4_1
  14. https://archive.org/details/lanjingdeyanjing0000bing
  15. https://archive.org/details/cd_le-ceneri-di-heliodoro_rome
  16. https://archive.org/details/alienatemyhomewo0004swar

MerlijnWajer avatar Sep 23 '21 15:09 MerlijnWajer

When testing on this nearly perfect image I still see a lot of things to improve to approach that quality. The jbig2-picture needs improvement. Not only the images should contain less fuzz, but the foreground-background separation needs work. The resulting background contains fuzz that shouldn't be there. So this is a good image to work on I think.

rmast avatar Jun 29 '22 15:06 rmast