jhove icon indicating copy to clipboard operation
jhove copied to clipboard

Outdated JWAT version in WARC module

Open maeb opened this issue 6 years ago • 11 comments

Dev Effort

3D

Description

Current version is 1.0.3 https://github.com/openpreserve/jhove/blob/2e9444442cdaad7f749daa2db8538310ca1cc16f/jhove-ext-modules/pom.xml#L15

The most recent version in Maven repository is 1.1.1: https://mvnrepository.com/artifact/org.jwat/jwat-warc

Version 1.0.6 fixed a critical error: Payload digest should not be checked for revisit records

maeb avatar Feb 21 '19 14:02 maeb

Tests failed when I bumped the version. Will investigate and hopefully issue a pull request if I am able to fix it.

maeb avatar Feb 22 '19 09:02 maeb

Have fixed 6/8 tests but the last 2 is causing a NullPointerException in JWAT. Will follow up and try to fix it upstream (JWAT).

maeb avatar Feb 22 '19 14:02 maeb

Confirmed bug in JWAT (header.warcTypeIdx is null causing NullPointerException):

https://github.com/netarchivesuite/jwat/blob/5ae169ce839288aa5cf9927dd64d2fcc14bced69/jwat-warc/src/main/java/org/jwat/warc/WarcRecord.java#L287

It is triggered by missing WARC-Type header in https://github.com/openpreserve/jhove/blob/2e9444442cdaad7f749daa2db8538310ca1cc16f/jhove-ext-modules/src/test/resources/warc/invalid-warcheaderfieldpolicy-7.warc#L2

used by https://github.com/openpreserve/jhove/blob/2e9444442cdaad7f749daa2db8538310ca1cc16f/jhove-ext-modules/src/test/java/edu/harvard/hul/ois/jhove/module/WarcModuleTest.java#L354

maeb avatar Feb 25 '19 13:02 maeb

See https://github.com/maeb/jhove/tree/maeb/jwat for a POC fix

maeb avatar Feb 25 '19 15:02 maeb

Once this is updated I need to test this #294 to see whether it's still an issue.

carlwilson avatar Feb 28 '19 10:02 carlwilson

Fix is merged. I have been in touch with @csrster to get a release in maven.

maeb avatar Jun 09 '19 18:06 maeb

Tried a naive update but hit unit test problems with the WARC module:

Tests run: 79, Failures: 8, Errors: 2, Skipped: 0, Time elapsed: 0.61 sec <<< FAILURE! - in edu.harvard.hul.ois.jhove.module.WarcModuleTest                                                                        
parseInvalidWarcHeaderFieldPolicy7(edu.harvard.hul.ois.jhove.module.WarcModuleTest)  Time elapsed: 0.032 sec  <<< ERROR!                                                                                           
java.lang.NullPointerException                                                                                                                                                                                     
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.generalInvalidChecks(WarcModuleTest.java:1001)                                                                                                          
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.parseInvalidWarcHeaderFieldPolicy7(WarcModuleTest.java:357)                                                                                             
                                                                                                                                                                                                                   
parseInvalidWarcHeaderFieldPolicy8(edu.harvard.hul.ois.jhove.module.WarcModuleTest)  Time elapsed: 0 sec  <<< ERROR!                                                                                               
java.lang.NullPointerException                                                                                                                                                                                     
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.generalInvalidChecks(WarcModuleTest.java:1001)                                                                                                          
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.parseInvalidWarcHeaderFieldPolicy8(WarcModuleTest.java:371)

parseInvalidWarcHeaderVersion16(edu.harvard.hul.ois.jhove.module.WarcModuleTest)  Time elapsed: 0.002 sec  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<3>
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.invalidErrorExpectedCheck(WarcModuleTest.java:951)
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.parseInvalidWarcHeaderVersion16(WarcModuleTest.java:662)

parseInvalidWarcHeaderVersion17(edu.harvard.hul.ois.jhove.module.WarcModuleTest)  Time elapsed: 0.002 sec  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<3>
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.invalidErrorExpectedCheck(WarcModuleTest.java:951)
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.parseInvalidWarcHeaderVersion17(WarcModuleTest.java:671)

parseInvalidWarcHeaderVersion18(edu.harvard.hul.ois.jhove.module.WarcModuleTest)  Time elapsed: 0.002 sec  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<3>
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.invalidErrorExpectedCheck(WarcModuleTest.java:951)
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.parseInvalidWarcHeaderVersion18(WarcModuleTest.java:680)

parseInvalidWarcHeaderVersion19(edu.harvard.hul.ois.jhove.module.WarcModuleTest)  Time elapsed: 0.002 sec  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<3>
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.invalidErrorExpectedCheck(WarcModuleTest.java:951)
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.parseInvalidWarcHeaderVersion19(WarcModuleTest.java:689)

parseInvalidWarcHeaderVersion20(edu.harvard.hul.ois.jhove.module.WarcModuleTest)  Time elapsed: 0.001 sec  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<3>
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.invalidErrorExpectedCheck(WarcModuleTest.java:951)
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.parseInvalidWarcHeaderVersion20(WarcModuleTest.java:698)

parseInvalidWarcReaderDiagnosis1(edu.harvard.hul.ois.jhove.module.WarcModuleTest)  Time elapsed: 0.001 sec  <<< FAILURE!
java.lang.AssertionError: expected:<3> but was:<4>
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.parseInvalidWarcReaderDiagnosis1(WarcModuleTest.java:787)

parseInvalidWarcFileFieldsEmpty(edu.harvard.hul.ois.jhove.module.WarcModuleTest)  Time elapsed: 0.002 sec  <<< FAILURE!
java.lang.AssertionError: expected:<16> but was:<17>
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.parseInvalidWarcFileFieldsEmpty(WarcModuleTest.java:150)

parseInvalidWarcFileFieldsInvalidFormat(edu.harvard.hul.ois.jhove.module.WarcModuleTest)  Time elapsed: 0.002 sec  <<< FAILURE!
java.lang.AssertionError: expected:<15> but was:<16>
        at edu.harvard.hul.ois.jhove.module.WarcModuleTest.parseInvalidWarcFileFieldsInvalidFormat(WarcModuleTest.java:165)
...
Results :

Failed tests:
  WarcModuleTest.parseInvalidWarcFileFieldsEmpty:150 expected:<16> but was:<17>
  WarcModuleTest.parseInvalidWarcFileFieldsInvalidFormat:165 expected:<15> but was:<16>
  WarcModuleTest.parseInvalidWarcHeaderVersion16:662->invalidErrorExpectedCheck:951 expected:<2> but was:<3>
  WarcModuleTest.parseInvalidWarcHeaderVersion17:671->invalidErrorExpectedCheck:951 expected:<2> but was:<3>
  WarcModuleTest.parseInvalidWarcHeaderVersion18:680->invalidErrorExpectedCheck:951 expected:<2> but was:<3>
  WarcModuleTest.parseInvalidWarcHeaderVersion19:689->invalidErrorExpectedCheck:951 expected:<2> but was:<3>
  WarcModuleTest.parseInvalidWarcHeaderVersion20:698->invalidErrorExpectedCheck:951 expected:<2> but was:<3>
  WarcModuleTest.parseInvalidWarcReaderDiagnosis1:787 expected:<3> but was:<4>
Tests in error:
  WarcModuleTest.parseInvalidWarcHeaderFieldPolicy7:357->generalInvalidChecks:1001 » NullPointer
  WarcModuleTest.parseInvalidWarcHeaderFieldPolicy8:371->generalInvalidChecks:1001 » NullPointer

So it'll need a little investigation before closing this, it appears there's two problems only in truth.

carlwilson avatar Oct 23 '19 08:10 carlwilson

I'll try to look into this ASAP, maybe next week. Must contact @csrster to get him to do a release of JWAT with the code in master.

Internally we use a forked version of jhove and jwat using jitpack.io to work around the lack of a release of jwat.

maeb avatar Nov 20 '19 14:11 maeb

JWAT-1.1.3 is being released to maven central as I write this. It includes the pull request from maeb.

nclarkekb avatar Jul 10 '22 09:07 nclarkekb