Greg Lindahl comments

Results 182 comments of


                                            Greg Lindahl

Support distinguishing www and www{number} in SURT generation and pywb's lookup

The original intent for www\d+ was to fix a problem with load balancing implementations in the early web. I tried to find some examples in Common Crawl's 2007+ archive, but...

Support distinguishing www and www{number} in SURT generation and pywb's lookup

Common Crawl wants to stop normalizing www\d+ and eventually rebuild our entire indexes that way. I support adding that kind of option to the whole tool ecosystem.

ClientSession keep transferring data after closed

The streaming interface does the right thing: if you call response.content.read(1000) and then response.close(), looks like only the first 256k of the response is read (a single read). I observed...

Improve support for manifest files (Common crawl example)

One thing that's missing is that our filenames in warc.paths.gz are relative to the bucket. There are 2 possible prefixes: https://data.commoncrawl.org/ and s3://commoncrawl/ This is a pretty common pattern for...

DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version

Tessa @tw4l is working on getting rid of the httpbin version dependency in PR 153 https://github.com/webrecorder/warcio/pull/153 -- and she's setting up Github Actions so we'll have CI again. Once that's...

Common "robots" values are undocumented

I think this suggestion predates RFC 9309, which is the new hotness in the robots.txt space.

Common "robots" values are undocumented

@sebastian-nagel is getting good at reading my mind! I was suggesting "rfc9309" as a value, since it's different from "classic".

Give croissant blobs a version?

SemVer has a "build identifier" that we can use -- it looks like 1.0.0+1.0.2, which means the version is 1.0.0 and the build is 1.0.2. We would keep 1.0.0 constant...

Give croissant blobs a version?

We are going to use this semver feature in our Croissants. @benjelloun I think it would be useful to mention this semver feature in the 1.1 🥐 spec

Give croissant blobs a version?

Sounds great! Thank you.