Tessa Walsh

Results 111 comments of Tessa Walsh

Ad blocking via request interception added in #173, via a new `--blockAds` flag

Include documentation on updating drivers from Puppeteer (crawler

Hi @gitreich - putting this on our sprint board to look into after IIPC WAC :)

I believe that I've hit this same issue attempting to stream the partial contents of a file with aibotocore. My use case is that I am extracting a file from...

> regular warcs + combined warc: `type: combined` or `type: web`? +1 for `web` as best term we've come up with so far for general WARC records capturing web traffic

Hi @cmillet2127, based on [a discussion in the minio-js repo](https://github.com/minio/minio-js/issues/619#issuecomment-326158139) I think the crawler should work as-is and minio-js will autodiscover the bucket if you use `s3.amazonaws.com` as the STORE_ENDPOINT_URL....

Also noticing that `js-wacz` is logging strings to stdout, which breaks our logging format. Might want to see what we can do about that. I suppose if we call it...

TODO: - Add WACZ validation (not yet supported in js-wacz) - Make CDXJ handling more memory-efficient in js-wacz (currently keeps all pages in memory, may OOM with large crawls) -...

Currently migrating the CI from Travis to GitHub Actions. Steps necessary include: - [x] Remove Travis config file and adding GitHub Actions workflow document to repo - [x] Update Python...

Ashley! This is so great! This is just the kind of thing I had in mind. A great addition! I'll follow up the details in PR #22, but thank you...