Tessa Walsh comments

Results 216 comments of


                                            Tessa Walsh

[Feature Request] Support for content blocker

> am using sometimes an ad blocker when I archive webpages with the extension; is that bad for my archiving files or the core elements of pages? Hi @hamoudak, the...

Specifying selectors for extracting links.

> We are impacted by this issue as well at Kiwix, we have a website to ZIM relying on `` as well. > > Should we also develop a custom...

make browsertrix-crawler runnable in serverless environments

Thanks for flagging this! > mkdir: cannot create directory ‘/.local’: Read-only file system touch: cannot touch '/.local/share/applications/mimeapps.list': No such file or directory /usr/bin/google-chrome: line 45: /dev/fd/63: No such file or...

make browsertrix-crawler runnable in serverless environments

> Is it possible that all the changes needed can be accommodate by chrome flags that we could already configure with `CHROME_FLAGS` as described in the [README](https://github.com/webrecorder/browsertrix-crawler#configuring-chromium--puppeteer--pywb)? It is possible!...

Migrate from setup.py to poetry/pyproject.toml

> We can use pytest instead of "python setup.py test" without migrating -- @tw4l do you have a preferred direction that you're going towards for python testing? If not, I'm...

Migrate from setup.py to poetry/pyproject.toml

Ah looking at the context a little bit more, I'm certainly not opposed to moving to `pyproject.toml`. And poetry does seem nice, though we're not using it for any other...

Migrate from setup.py to poetry/pyproject.toml

> For "Skip test_capture_https_proxy" do you think there's an easy way to fix it? I faintly recall this is am urllib problem. Thankfully my past self thought to add the...

Migrate from setup.py to poetry/pyproject.toml

Pinging @ikreymer to weigh in on the release and Poetry questions. Thanks @white-gecko and @wumpus for all the work on those PRs!

Crawl JS and CSS

Hi @Dooriin, the pages.jsonl file is meant to be an index of the HTML pages only, but you should be able to find everything that was crawled in the CDXJ...

Crawl JS and CSS

Hi @Dooriin, sorry for the delayed response! This is something we're actually looking into now as we develop features around assisted crawl QA in Browsertrix Cloud. We have a PR...