Tessa Walsh issues

Results 78 issues of


                                            Tessa Walsh

trafficstars

Update column sorting for archived Items list page

Will require backend and frontend changes ## In table, not sortable in backend yet - Name (with `firstSeedURL + x URLs` fallback) - Pages crawled ## Sortable, not in table...

Add nightly integration tests for modifying org storage

Follow-up to #2093 Related to #578 The backend tests for org storage cover adding and removing a custom storage, as well as setting the primary and replica storage locations for...

[Feature]: Add support for custom crawl headers

### What change would you like to see? Requested on IIPC Slack: "We need the option to set a request header name and value in the configuration. It could be...

enhancement

Document WARC pageinfo records

Browsertrix Crawler now creates `pageinfo` records, which are a key component of the Browsertrix quality assurance system. We should document these records, either in the warc-specifications repository or our own...

Disabling JPEG carver not working

Hi, In Bulk Extractor 2.1.1, the following command still carves out jpeg files to the `jpeg` directory: `bulk_extractor -o be_out -S jpeg_carve_mode=0 /path/to/source/dir` I see https://github.com/simsong/bulk_extractor/issues/468 fixed some issues related...

bug

needs test case

carver

Add full support for WACZ

Currently pywb can add WACZ files to a collection via unpacking. The next step is to properly support WACZ files as-is.

Support downloading seed file from URL

Fixes #841 Crawler work toward long URL lists in Browsertrix. This PR moves seed handling from the arg parser's validation step to the crawler's bootstrap step in order to be...

Support downloading seedFile from online source

Connected to https://github.com/webrecorder/browsertrix/issues/2312 Similar to custom behaviors, the crawler should be able to download a seed file from any accessible URL, simply by specifying a URL instead of filepath to...

Pin Python to 3.12 for all CI

Temporary solution to https://github.com/webrecorder/browsertrix/issues/2947 to get backend CI working again until Browsertrix has Python 3.14 support. Nightly test run: https://github.com/webrecorder/browsertrix/actions/runs/19048128002

WIP: Bump pydantic and fastapi to support Python 3.14

Fixes #2947 Bumps pydantic and fastapi to latest versions, pinned to specific versions to ensure we're not accidentally affected by a breaking change. Nightly test run from this branch: https://github.com/webrecorder/browsertrix/actions/runs/19042993453...