fallback mode no different than regular mode in 0.18.8.rc1
The fallback mode in previous versions (e.g. 0.14.6) used to produce the whole PDF page rendered into background image and invisible html text layered atop of it. As a result, a user could copy paste the invisible text, and crop interesting fragments from the fully rendered background. This is the expected result. I ran this version from Docker.
In the current version 0.18.8.rc1 the fallback mode seems no different than regular mode, i.e. minimal background image and whole text rendered as visible html. This is probably a bug. I ran this version from the .deb package.
You can see that the background image generated is different in both cases.
Is this intentional behaviour or a bug?
steps to reproduce: download https://www.nasa.gov/pdf/298870main_SP-2008-565.pdf
puchacz@vbkub01:~/tmp$ sha256sum 298870main_SP-2008-565.pdf
f0fe4c92722b17dbed398a4b582bd1dfb16c365d6ccd2b58a20b7ed7be5bcc21 298870main_SP-2008-565.pdf
Incorrect output from build installed from deb archive:
puchacz@vbkub01:~/tmp$ pdf2htmlEX -v
pdf2htmlEX version 0.18.8.rc1
Copyright 2012-2015 Lu Wang <[email protected]> and other contributors
Libraries:
poppler 0.89.0
libfontforge (date) 20200314
cairo 1.15.10
Default data-dir: /usr/local/share/pdf2htmlEX
Poppler data-dir: /usr/local/share/pdf2htmlEX/poppler
Supported image format: png jpg svg
puchacz@vbkub01:~/tmp$ pdf2htmlEX --dest-dir /home/puchacz/tmp/columbia-deb --fallback 1 --split-pages 1 --first-page 23 --last-page 23 --embed-font 0 --embed-image 0 /home/puchacz/tmp/298870main_SP-2008-565.pdf
Preprocessing: 1/1
Working: 1/1
puchacz@vbkub01:~/tmp$ ls -lh columbia-deb/bg17.png
-rw-rw-r-- 1 puchacz puchacz 7.9K Dec 29 23:44 columbia-deb/bg17.png
small file containing basic background only, 7.9K

puchacz@vbkub01:~/tmp$ pdf2htmlEX-DOCKER -v
pdf2htmlEX version 0.14.6
Copyright 2012-2015 Lu Wang <[email protected]> and other contributors
Libraries:
poppler 0.26.5
libfontforge 20120731
Default data-dir: /usr/share/pdf2htmlEX
Supported image format: png jpg
puchacz@vbkub01:~/tmp$ pdf2htmlEX-DOCKER --dest-dir /home/puchacz/tmp/columbia-docker --fallback 1 --split-pages 1 --first-page 23 --last-page 23 --embed-font 0 --embed-image 0 /home/puchacz/tmp/298870main_SP-2008-565.pdf
Preprocessing: 1/1
Working: 1/1
puchacz@vbkub01:~/tmp$ ls -lh columbia-docker/bg17.png
-rw-r--r-- 1 root root 354K Dec 29 23:45 columbia-docker/bg17.png
large file containing full page, 354K
