Christopher Foo

Results 14 comments of Christopher Foo

Yeah, the thing is ridiculously slow. You won't expect ungzip performance because it's not just ungzip. When it's extracting, it has to parse each WARC record because it's human-readable and...

Oops, another case that wasn't tested. As a workaround, maybe try something like `your_file_object.name = None`?

Hello, thanks for taking a look! When writing the notdef threshold algorithm, I knew it was a too simple heuristic and doesn't perform well in some cases. But the line...

Oh, I'd like to add that this is an issue because my WARC file failed to derive proper CDX files on Internet Archive: https://archive.org/details/delcampe_20140126 .