pywb
pywb copied to clipboard
Core Python Web Archiving Toolkit for replay and recording of web archives
I would like to know the original page source of a record. Is this possible? Let's say some google font is used on page https://example.com/about. A response or revisit record...
It would be nice if a revisit record has a WARC-Refers-To field as is recommended in the WARC specification. https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.0/#profile-identical-payload-digest
I have a recording for which I configured dedup_policy: revisit. The request record for a resource that has already been visited has a WARC-Concurrent-To field. Unfortunately that field value does...
## Is your feature request related to a problem? Please describe. I am working with filtered downloads of the Common Crawl dataset (~100TB, with plans to grow to ~200TB), so...
## Describe the bug When used to record an HTTP response that uses `Transfer-Encoding: chunked`, pywb produces a WARC record where the chunks have been decoded but the `Transfer-Encoding` header...
Is it possible to set the host_prefix variable when rewriting HTML with a custom value? Currently using the following setup. PYWB in docker container. Sits behind a nginx reverse proxy...
## Describe the solution you'd like I want to archive an old google site but it requires a Google login, but the webpage doesn't work on Pywb. I could access...
Hello there, I'd like to make a feature request - a way to list all URLs in `pywb`. [Web Archive Player](https://github.com/ikreymer/webarchiveplayer) has a similar feature, and it would be nice...
## Describe the bug When pywb rewrites `eval` it wraps it in a function call. Unfortunately this breaks code which declares function-scoped variables using `var` and then accesses them outside...
when i deploy to heroku it says application error here are the logs: 