warcio icon indicating copy to clipboard operation
warcio copied to clipboard

fix payload digest for chunked response in test warc

Open nlevitt opened this issue 6 years ago • 4 comments

exposes https://github.com/webrecorder/warcio/issues/74

nlevitt avatar Oct 25 '19 21:10 nlevitt

sha1 computed like so:

$ warcio extract --payload test/data/example-iana.org-chunked.warc 405 | sha1sum
8846f23ce943a3b70089f86345626778cd93f11e  -

nlevitt avatar Oct 25 '19 21:10 nlevitt

Sorry I don't have a fix for the bug :)

nlevitt avatar Oct 25 '19 21:10 nlevitt

Bug reports lacking a fix are welcome, but pull requests are more for things that don't break the build! Also, can you please respond to the thread in #74 as to why an extremely disruptive change should be made that invalidates all digests compute befored the change? Any actual fix is going to need to support old-style-but-valid and new-style-and-valid digests.

wumpus avatar Oct 26 '19 05:10 wumpus

Also, can you please respond to the thread in #74 as to why an extremely disruptive change should be made that invalidates all digests compute befored the change?

I didn't think there was any controversy about interpretation of the warc spec. The first comment on #74 spells it out clearly and concisely (thanks @JustAnotherArchivist).

This change doesn't affect "all digests". It only affects records with tranfer-encoded payloads. I fail to see how this is "extremely disruptive"? It's not about old-style vs new-style. The fact that we're here debating this is ironic because the test warc that this pull request applies to was written by warcprox. The original warcprox code had this bug in payload digest calculation and I fixed the bug in 2017.

There are lots of warcs out there with incorrect payload digests (on chunked http responses), and lots of warcs out with correct payload digests. Given that, we should follow the spec going forward.

Bug reports lacking a fix are welcome, but pull requests are more for things that don't break the build!

Fair enough. The failing test could be commented out until the fix is in place, I suppose?

nlevitt avatar Oct 29 '19 17:10 nlevitt