pywb
pywb copied to clipboard
Core Python Web Archiving Toolkit for replay and recording of web archives
I am new in pywb, It is working fine for all websites but some of the websites images and video getting 404 not found an error I have added hypothesis...
Hello, I am having troubles with pywb in windows. When I run with property framed_replay: true, I get blank page and in the browser console I can see error: "Uncaught...
I am using the latest version of the pywb but it is taking more time to record other websites, I have created http://mydomain.com/live/https:google.com, It is working fine but taking more...
## Describe the bug Resources are returned extremely slowly (~3 minutes) for a large collection (34Gb, >1m records). While the page is loading, exactly one core of the server's CPU...
This change breaks our `archive_paths: "webhdfs://server/" because `os.path.join` just discards the prefix when the suffix is an absolute path. https://github.com/webrecorder/pywb/blob/92e459bda52a2b03f33a4b0b8094ed424248d2a5/pywb/warcserver/resource/pathresolvers.py#L40
Being able to index and re-index collections that are located on remote storage (S3) would be very helpful.
TimeGate in redirect mode `MUST` use `302`-style content negotiation and not `307`, which is not part of the Memento RFC, should `307`-style be mandatory, the matter must be discussed with...
If there are variations of mementos (e.g., banner, rewritten, raw), the community `SHOULD` discuss how to report them in `Link` header and TimeMaps and which ones should be reported in...
Fix PyWB documentation to align with the implementation. See: https://ws-dl.blogspot.com/2020/03/2020-03-26-memento-compliance-audit-of.html#3-4-timegate
Navigational memento link relations (i.e., `first`, `prev`, `next`, and `last`) are recommended to be included in `Link` header of TimeGate and memento responses as many tools rely on them. See:...