Tessa Walsh
Tessa Walsh
Will require backend and frontend changes ## In table, not sortable in backend yet - Name (with `firstSeedURL + x URLs` fallback) - Pages crawled ## Sortable, not in table...
Follow-up to #2093 Related to #578 The backend tests for org storage cover adding and removing a custom storage, as well as setting the primary and replica storage locations for...
### What change would you like to see? Requested on IIPC Slack: "We need the option to set a request header name and value in the configuration. It could be...
Browsertrix Crawler now creates `pageinfo` records, which are a key component of the Browsertrix quality assurance system. We should document these records, either in the warc-specifications repository or our own...
Hi, In Bulk Extractor 2.1.1, the following command still carves out jpeg files to the `jpeg` directory: `bulk_extractor -o be_out -S jpeg_carve_mode=0 /path/to/source/dir` I see https://github.com/simsong/bulk_extractor/issues/468 fixed some issues related...
Currently pywb can add WACZ files to a collection via unpacking. The next step is to properly support WACZ files as-is.
Fixes #841 Crawler work toward long URL lists in Browsertrix. This PR moves seed handling from the arg parser's validation step to the crawler's bootstrap step in order to be...
Connected to https://github.com/webrecorder/browsertrix/issues/2312 Similar to custom behaviors, the crawler should be able to download a seed file from any accessible URL, simply by specifying a URL instead of filepath to...
Temporary solution to https://github.com/webrecorder/browsertrix/issues/2947 to get backend CI working again until Browsertrix has Python 3.14 support. Nightly test run: https://github.com/webrecorder/browsertrix/actions/runs/19048128002
Fixes #2947 Bumps pydantic and fastapi to latest versions, pinned to specific versions to ensure we're not accidentally affected by a breaking change. Nightly test run from this branch: https://github.com/webrecorder/browsertrix/actions/runs/19042993453...