Tessa Walsh
Tessa Walsh
Hi Anders! Thanks for reporting this, definitely seems like a bug we'd want to address. By any chance, are the WACZ files you're uploading from Browsertrix multi-WACZs? Just a hunch...
I have verified that this is an issue with multi-waczs, where our routine to read the pagelist on WACZ upload doesn't account for multi-WACZ. Unfortunately the remotezip library we're using...
@tw4l To create feature document as first step, likely implementation involves uploading list as a file that crawler can download
Supported in 1.18, which will be released shortly!
Hi @xiaozhile, thanks for the report! If I'm understanding your use case correctly, this is something you should already be able to do in ArchiveWeb.page. If you open the browser...
I wonder if it might be better to direct fetch any URL that ends in a file extension (and that's not `.html` or `.htm`, since some older sites followed that...
> Yeah, maybe that's a smaller list to maintain, would also include .asp, .php, etc.. Another option is to always try browser load, and then if non-HTML, add extension to...
Hi @pato-pan , on the latest Browsertrix Crawler releases (since 1.6.3), the disk utilization check should be disabled by default. It looks like you're hitting a related but different check...
Hi @furllmm, thanks for the issue. Have you tried enabling "Archive local storage" in the extension settings? First, open Settings by clicking on the cog icon in the extension homepage,...
Hi, this is a known issue - our tools tend not to do capture/replay existing web archives well. The issue stems from the fact that the Internet Archive's Wayback Machine...