Tessa Walsh

Results 216 comments of Tessa Walsh

This should no longer be happening in 1.1.x and forward - we added some checks to make sure that the WARC records are written before the crawler shuts down as...

> * recrawls, how to do them ([was asked](https://forum.webrecorder.net/t/ability-to-retry-errors/185) before here) Currently there's no way to partially re-crawl with browsertrix-crawler. In our Browsertrix Cloud system you can use the archiveweb.page...

I'm struggling with the same HTTPS proxy issue as you document above, but hopefully will work it out soon!

Turns out pinning urllib3 to an older version for now resolves it! PR to switch to GitHub Actions CI is now open :) https://github.com/webrecorder/warcio/pull/164

> My current proposal is to revise all `warcio` functions returning a datetime and do something like: > > ``` > def timestamp_to_datetime(string, tzinfo:datetime.timezone=None) -> datetime.datetime: > # ^^^^^^^ HERE...

Interesting! I'm all for making changes to speed up virus scanning so long as the user configuration doesn't get too complicated. Would you have bandwidth to look into this a...

When the multi-WACZs being produced in this branch are loaded into ReplayWeb.page, no seed pages or resources are listed. There may be something slightly off, investigating further.

Tested on dev and working well! Nice job

Assigned to me to investigate what backend sorting changes will be necessary.

Notes on changes that will be necessary for this: ## Archived Items ### In table, not sortable - Name (with `firstSeedURL + x URLs` fallback) - Pages crawled ### Sortable,...