Greg Lindahl

Results 182 comments of Greg Lindahl

I think those are separate bugs from what we were just discussing but bugs none the less -- can you provide a warc with records having these 'features' and a...

Bug reports lacking a fix are welcome, but pull requests are more for things that don't break the build! Also, can you please respond to the thread in #74 as...

@sebastian-nagel recently discovered that space-in-uri was an issue in Common Crawl's ARCs collected 2008-2012. Good that we are working around this issue in warcio!

I am for it, but it would require some sleuthing. Have you already made a list of the ids? Looks like it would require a bit of looking around, perhaps...

If I channel my inner @ikreymer I think I would make a list of headers which are allowed to be repeated by the HTTP standard, and then meditate on that...

See https://github.com/iipc/warc-specifications/issues/33 for my opinion that it's a little unspecified if warcinfo records are necessarily parsable. But, most are. So basically you're asking that warcinfo headers be treated similar to...

Oh, and a second implementation choice: expose a helper function that attempts to parse warcinfo from the stream. Then it would be a lot more obvious when it failed to...

@N0taN3rd traditionally you've been my best reviewer :-)

@N0taN3rd has done a preliminary review, the main addition since then is some global checks. At this point I think the code is feature-complete, well, for the things I'm planning...