pywb
pywb copied to clipboard
Core Python Web Archiving Toolkit for replay and recording of web archives
## Describe the bug Edit: three week review and cleanup. At the bottom of this report is my config.yaml for reference. Following along the documentation with regard to fallbacks via...
## Expected behavior I crawled this Twitter account: https://twitter.com/PACKEDvzw with both Brozzler and Browsertrix. The Brozzler WARC file is 215 MB. I would expect that I can scroll down the...
Hi, I'm using pywb 2.1.0 In `pywb/warcserver/index/query.py` we can set a param to limit the number of index lines returned (default: 100000 lines). pywb/warcserver/index/query.py (line 69) : ``` @property def...
## Describe the bug Let's say I try to record the current google page, I'm calling `https://wayback.url.com/collection/record/https://www.google.com/` And I just see an empty screen. Is that expected? If I remove...
## Describe the bug a `pywb` instance that is configured to run in 'proxy-mode' for replay, not recording, shows 'Pywb Error No handler for ``' instead of the expected archived...
I have a web archive with a custom directory structure (recorded in other software). Is it possible to scan this structure automatically for new warc files without moving them to...
## Describe the bug Firefox gives an error MOZILLA_PKIX_ERROR_MITM_DETECTED because it sees pywb proxy cert ## Steps to reproduce the bug Enable proxy mode using docker (docker run -p 9090:8080...
As per [RFC 5988](https://tools.ietf.org/html/rfc5988) arbitrary attributes are not allowed in `Link`, hence `collection` attribute in `Link` header and TimeMap entity `MUST` be removed or incorporated as per [RFC 6573](https://tools.ietf.org/html/rfc6573). See:...
I made a warc file with storm-crawler and in this website - https://lietuva.lt/ - images are represented as: div class="fold" style="background-image: url('https://lietuva.lt/wp-content/uploads/2019/07/VMG-Alfas-Ivanauskas-šaltibarščiai.jpg')" when I open my crawled warc file with...
## Is your feature request related to a problem? Please describe. boto3 allows custom endpoint_urls, but they can not be set via an environment variable. This would allow using `s3://`...