Tessa Walsh comments

Results 216 comments of


                                            Tessa Walsh

[Bug]: it is not possible to reference a wacz file back to where it comes from e.g. using the GUI crawl_id

Hi @tuehlarsen, the `manual-20230225141525-7c09730b-c08` part of the WACZ fiename should be the crawl id in Browsertrix! You can check the crawl id field in the crawl's Overview tab to verify....

[Bug]: it is not possible to reference a wacz file back to where it comes from e.g. using the GUI crawl_id

It's worth noting that the same crawl id is part of the naming convention for the WARC files within the WACZ as well, but the WARC filenames have additional prefixes...

[Bug]: it is not possible to reference a wacz file back to where it comes from e.g. using the GUI crawl_id

Ah @tuehlarsen, I forgot that this is actually configurable in the Helm chart! Which explains why what I was seeing on our dev server differed. In `chart/values.yaml`, take a look...

[Feature]: Sort by running in "Crawl workflows" + faster pagination

For now for the backend change to make is to ensure that running crawls are always floated to the top for "Latest Crawl" sort order. We may look into additional...

Crawl button with javascript navigation

Hi @hamzamac, would you be able to share the URL of the site you're trying to capture so I can take a look?

Crawl button with javascript navigation

Hm, you shouldn't need to include the URIs for scripts - if the script is on the page, the crawler will discover it. This looks to me like it's more...

Automatically add exclusion rules based on `robots.txt`

Documenting for future reference - at this point, robots.txt support in Browsertrix is at the page level only. Pages that are disallowed by per-host robots.txts will be skipped rather than...

Allow user to enable -multi option in siegfried

Hi @kieranjol, apologies for the delay in responding but yes, this sounds like a great improvement! Feel free to submit a PR if you'd like, or I can try to...

[Feature]: immediate pointwise repair via web frontend error message

This is an interesting suggestion. Just so you're aware, recent versions of the crawler do automatically retry pages that fail to load during the initial attempt at the end of...

Add custom behaviors for org defaults

Backend support added in https://github.com/webrecorder/browsertrix/pull/2505 with the rest of the custom behaviors backend implementation.