pdf2archive icon indicating copy to clipboard operation
pdf2archive copied to clipboard

perhaps switch validator so as to support newest versions of java?

Open stuart-little opened this issue 3 years ago • 3 comments

I have found that the current verapdf validator errors on newer versions of java:

$ java -version
openjdk version "15.0.2" 2021-01-19
OpenJDK Runtime Environment (build 15.0.2+7)
OpenJDK 64-Bit Server VM (build 15.0.2+7, mixed mode)

and then

$ ./pdf2archive --validate ~/Downloads/boo.pdf 
=== Welcome to PDF2ARCHIVE ===
GPL Ghostscript 9.54.0: Unrecoverable error, exit code 1
  Creating the definition file...
  Compressing PDF & embedding fonts...
  Converting to PDF/A-1B...
  Removing temporary files...
  Done, now ESSE3 is happy! ;)
  Validating...
./verapdf/verapdf: line 125: [: 150 2021-01-19: integer expression expected
Exception in thread "main" java.lang.NoClassDefFoundError: javax/xml/bind/JAXBException
        at org.verapdf.apps.Applications.createConfigManager(Applications.java:75)
        at org.verapdf.apps.Applications.createAppConfigManager(Applications.java:87)
        at org.verapdf.cli.VeraPdfCli.<clinit>(VeraPdfCli.java:46)
        at org.verapdf.apps.GreenfieldCliWrapper.main(GreenfieldCliWrapper.java:34)
Caused by: java.lang.ClassNotFoundException: javax.xml.bind.JAXBException
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:606)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        ... 4 more

On the other hand, I've found this java pdf library, that comes with a convenient command-line validator (a jar file).

I have checked that it agrees with your current vera validator on a number of files, both pre- and post-conversion. Before:

$ java -jar ~/Downloads/preflight-app-3.0.0-RC1.jar ~/Downloads/boo.pdf
---
The file boo.pdf is not a valid PDF/A-1b file, error(s) :
1.2.1 : Body Syntax error, EOL expected before the 'endobj' keyword at offset 471948
1.2.1 : Body Syntax error, EOL expected before the 'endobj' keyword at offset 475853
1.2.1 : Body Syntax error, EOL expected before the 'endobj' keyword at offset 468071
...

After:

$ java -jar ~/Downloads/preflight-app-3.0.0-RC1.jar ~/Downloads/boo-PDFA.pdf
---
The file boo-PDFA.pdf is a valid PDF/A-1b file

This works on both

$ java -version
---
openjdk version "15.0.2" 2021-01-19
OpenJDK Runtime Environment (build 15.0.2+7)
OpenJDK 64-Bit Server VM (build 15.0.2+7, mixed mode)

and

$ java -version
---
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)

(the two versions I have handy right now).

stuart-little avatar Jun 04 '21 15:06 stuart-little

Hi @stuart-little, finally got some time to comment on this as well.

Thanks for pointing out the preflight app in PDFBox; I myself sometimes use PDFBox for some other stuff, so I'll definitely check it out.

The problem with VeraPDF is almost surely caused by the fact that the version of VeraPDF shipped with PDF2ARCHIVE is super old (like ~4 years). In the past I've had to fix compatibility with newer Java versions because of similar reasons, i.e. changes in the Java libraries.

Can you please just try to upgrade VeraPDF? It's likely that just upgrading will fix the error with Java, so it's not really an issue with the validator per se, it's more of an issue with the shipped version. To install a newer version of VeraPDF, do in a terminal:

wget http://downloads.verapdf.org/rel/verapdf-installer.zip
unzip verapdf-installer.zip
cd verapdf-<version>
./verapdf-install.sh

and, when it asks for where to install it, specify <location_of_pdf2archive>/verapdf as the installation path. It should detect there is an older version installed and should ask whether you want to overwrite the old files; just say 'yes' and it should install flawlessly. Then, at this point, try again with pdf2archive and let's see if that solves the issues with Java. 🙂

I'll try to upload an updated version of VeraPDF to GitHub if you can confirm that this solves the problem. I would probably not switch the default validator just like that, but I think it could be worth adding the option to use a different supported validator like PDFBox. That would stay in 'feature-paradise' for a while, though, as right now I barely have any time at all. 🙂

matteosecli avatar Jun 24 '21 10:06 matteosecli

Thanks! I'll give that a go.

stuart-little avatar Jun 24 '21 12:06 stuart-little

I wanted to confirm that a year later, the instructions above (get the latest Vera PDF and install it into the pdf2archive/verapdf directory) work fine to fix any Java version errors.

sevagh avatar Aug 12 '22 13:08 sevagh