warcreate icon indicating copy to clipboard operation
warcreate copied to clipboard

URIs with invalid characters are not escaped

Open machawk1 opened this issue 7 years ago • 1 comments

In some places on the web, invalid URIs may be used to identify resource representations. For example, at one point (perhaps still) Google Fonts recommended values like https://fonts.googleapis.com/css?family=Open+Sans:400,600,800,700|Open+Sans+Condensed:300.

The un-encoded pipe (|) here is invalid via RFC3986 (also see here) and I believe it may be WARCreate's responsibility to ensure this value is stored in WARCs in a manner that ensures interoperability.

$ jwattools test -e warc-in-question.warc will report these errors for invalid WARCs in the produced i.out file.

TODO: check validity of URIs, particularly in the WARC-Target-URI field, prior to association them with a preserved entity representation.

machawk1 avatar Oct 22 '18 16:10 machawk1

See also https://github.com/google/fonts/issues/1163

machawk1 avatar Oct 22 '18 16:10 machawk1