Tessa Walsh
Tessa Walsh
Hi @tuehlarsen, the `manual-20230225141525-7c09730b-c08` part of the WACZ fiename should be the crawl id in Browsertrix! You can check the crawl id field in the crawl's Overview tab to verify....
It's worth noting that the same crawl id is part of the naming convention for the WARC files within the WACZ as well, but the WARC filenames have additional prefixes...
Ah @tuehlarsen, I forgot that this is actually configurable in the Helm chart! Which explains why what I was seeing on our dev server differed. In `chart/values.yaml`, take a look...
For now for the backend change to make is to ensure that running crawls are always floated to the top for "Latest Crawl" sort order. We may look into additional...
Hi @hamzamac, would you be able to share the URL of the site you're trying to capture so I can take a look?
Hm, you shouldn't need to include the URIs for scripts - if the script is on the page, the crawler will discover it. This looks to me like it's more...
Documenting for future reference - at this point, robots.txt support in Browsertrix is at the page level only. Pages that are disallowed by per-host robots.txts will be skipped rather than...
Hi @kieranjol, apologies for the delay in responding but yes, this sounds like a great improvement! Feel free to submit a PR if you'd like, or I can try to...
This is an interesting suggestion. Just so you're aware, recent versions of the crawler do automatically retry pages that fail to load during the initial attempt at the end of...
Backend support added in https://github.com/webrecorder/browsertrix/pull/2505 with the rest of the custom behaviors backend implementation.