Tessa Walsh
Tessa Walsh
The way that pywb rewrites URLs for POST (and other non-GET) request canonicalization ends up writing Pythonic values into the URL such as `True`, `False`, and `None`, whereas we ideally...
## Description This PR modifies pywb to detect and rewrite JS modules. Global overrides are imported in modules via a new [_wb_module_decl.js](https://github.com/webrecorder/pywb/compare/main...dev/issue-674-module-rewriting#diff-ff32b62008ae6fedbe3f594839c8b45d30dec26ffb82a2a61f711ef4ec7e9488) static file. This also bumps wombat to 3.4.2....
Connected to #182 May be irrelevant if we switch to using Brave for crawling.
Improvements for 1.0.0 branch of crawler: - Switch from using py-wacz to [js-wacz ](https://github.com/harvard-lil/js-wacz) for WACZ generation - Pass in indexes from `/tmp-cdx` rather than reindexing from WARCS - Support...
First pointed out in: https://github.com/webrecorder/browsertrix-crawler/issues/74#issuecomment-1087661811 https://github.com/webrecorder/browsertrix-crawler/pull/28 writes extracted full text into `pages.jsonl`, which makes that file quite large and difficult to parse. We may want to rethink where the extracted...
Fixes #903 Needs tests and rebasing on main after #1394 is merged
Fixes #890 This PR introduces new org import and export API endpoints, as well as new Administrator deployment documentation on how to manage the process of exporting and importing orgs,...
Frontend follow-up to https://github.com/webrecorder/browsertrix-cloud/issues/903 We'll want to require careful validation of this action, as it will delete an org and all its constituent data. This should be available as an...
Related to https://github.com/webrecorder/browsertrix-cloud/issues/1551 We should publish our API ReDoc site as a separate static page (ideally versioned by version of the app) rather than pointing people to running instances of...
Backend API endpoint (also deleting all related data)