pdf2archive
pdf2archive copied to clipboard
perhaps switch validator so as to support newest versions of java?
I have found that the current verapdf
validator errors on newer versions of java
:
$ java -version
openjdk version "15.0.2" 2021-01-19
OpenJDK Runtime Environment (build 15.0.2+7)
OpenJDK 64-Bit Server VM (build 15.0.2+7, mixed mode)
and then
$ ./pdf2archive --validate ~/Downloads/boo.pdf
=== Welcome to PDF2ARCHIVE ===
GPL Ghostscript 9.54.0: Unrecoverable error, exit code 1
Creating the definition file...
Compressing PDF & embedding fonts...
Converting to PDF/A-1B...
Removing temporary files...
Done, now ESSE3 is happy! ;)
Validating...
./verapdf/verapdf: line 125: [: 150 2021-01-19: integer expression expected
Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/bind/JAXBException
at org.verapdf.apps.Applications.createConfigManager(Applications.java:75)
at org.verapdf.apps.Applications.createAppConfigManager(Applications.java:87)
at org.verapdf.cli.VeraPdfCli.<clinit>(VeraPdfCli.java:46)
at org.verapdf.apps.GreenfieldCliWrapper.main(GreenfieldCliWrapper.java:34)
Caused by: java.lang.ClassNotFoundException: javax.xml.bind.JAXBException
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:606)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 4 more
On the other hand, I've found this java
pdf library, that comes with a convenient command-line validator (a jar
file).
I have checked that it agrees with your current vera
validator on a number of files, both pre- and post-conversion. Before:
$ java -jar ~/Downloads/preflight-app-3.0.0-RC1.jar ~/Downloads/boo.pdf
---
The file boo.pdf is not a valid PDF/A-1b file, error(s) :
1.2.1 : Body Syntax error, EOL expected before the 'endobj' keyword at offset 471948
1.2.1 : Body Syntax error, EOL expected before the 'endobj' keyword at offset 475853
1.2.1 : Body Syntax error, EOL expected before the 'endobj' keyword at offset 468071
...
After:
$ java -jar ~/Downloads/preflight-app-3.0.0-RC1.jar ~/Downloads/boo-PDFA.pdf
---
The file boo-PDFA.pdf is a valid PDF/A-1b file
This works on both
$ java -version
---
openjdk version "15.0.2" 2021-01-19
OpenJDK Runtime Environment (build 15.0.2+7)
OpenJDK 64-Bit Server VM (build 15.0.2+7, mixed mode)
and
$ java -version
---
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)
(the two versions I have handy right now).
Hi @stuart-little, finally got some time to comment on this as well.
Thanks for pointing out the preflight app in PDFBox; I myself sometimes use PDFBox for some other stuff, so I'll definitely check it out.
The problem with VeraPDF is almost surely caused by the fact that the version of VeraPDF shipped with PDF2ARCHIVE is super old (like ~4 years). In the past I've had to fix compatibility with newer Java versions because of similar reasons, i.e. changes in the Java libraries.
Can you please just try to upgrade VeraPDF? It's likely that just upgrading will fix the error with Java, so it's not really an issue with the validator per se, it's more of an issue with the shipped version. To install a newer version of VeraPDF, do in a terminal:
wget http://downloads.verapdf.org/rel/verapdf-installer.zip
unzip verapdf-installer.zip
cd verapdf-<version>
./verapdf-install.sh
and, when it asks for where to install it, specify <location_of_pdf2archive>/verapdf
as the installation path. It should detect there is an older version installed and should ask whether you want to overwrite the old files; just say 'yes' and it should install flawlessly. Then, at this point, try again with pdf2archive
and let's see if that solves the issues with Java. 🙂
I'll try to upload an updated version of VeraPDF to GitHub if you can confirm that this solves the problem. I would probably not switch the default validator just like that, but I think it could be worth adding the option to use a different supported validator like PDFBox. That would stay in 'feature-paradise' for a while, though, as right now I barely have any time at all. 🙂
Thanks! I'll give that a go.
I wanted to confirm that a year later, the instructions above (get the latest Vera PDF and install it into the pdf2archive/verapdf directory) work fine to fix any Java version errors.