Alex Osborne

Results 135 comments of Alex Osborne

> perhaps should specify that it should be same format as software field in warcinfo records, or do you think the more formal spec makes sense? Unfortunately the format of...

I've updated the proposal replacing WARC-Conversion-Options with the more specific WARC-Conversion-Command field. > Unfortunately, ffmpeg does not print things out in a clean format like that Yeah, that's one of...

Observations of WARCs found in the wild (web searches, looking at various tools): * Observed algorithms: SHA-1, MD5, SHA-256 * SHA-1 is usually uppercase Base32 encoded (as recommended by the...

> would sha512/256: work or would the slash cause a problem? Unfortunately `/` is disallowed by the grammar in the algorithm name as it is part of `separators`. I've added...

Do we have any use cases in mind for this field when reading the WARC? I guess one might be be listing all the top-level crawled documents. This can't be...

> it would be useful to know if a record is for a seed URL. Or is there another common way of doing that? For WARCs created by Heritrix a...

I agree that it can be more practical to include custom metadata in the response/resource record headers instead of a separate metadata record so that it can be retrieved without...

Here are a few possible solutions to the problem of discovering related records. #### Status quo (at least in the implementations I'm aware of) Build Wayback-style indexes mapping (uri, date)...

It seems useful to allow it for requests as the software creating the warc file may want to identify the content type of the request payload. For example when JavaScript...

It's present in the BNF last draft so probably this is another markdown conversion error.