warcit
warcit copied to clipboard
Convert Directories, Files and ZIP Files to Web Archives (WARC)
If user specifies a prefix that is missing the scheme/results in invalid urls, issue a warning, and/or attempt to correct. Example in webrecorder/webrecorder-player#96
Hi all, I'm trying to set this up under Python 3 on macOS 10.14.1 and keep getting this error: ``` clang: warning: libstdc++ is deprecated; move to libc++ with a...
I downloaded a website from Internet Archive using [wayback-machine-downloader](https://github.com/hartator/wayback-machine-downloader) then created a WARC using warcit with the following command: `warcit --fixed-dt 20100212221453 http://domainname.com /dirpath`. It did create a WARC file....
Hi! I would like to use the warcit library, but as an API rather than through the command line. This is because the warcit library right now only goes through...
- [ ] charset detection with files encoded differently - [ ] `--include` / `--exclude` - [ ] CSV data (`--mapfile`) - [ ] fix tests for Windows - [...
Document these features in README.rst: - [ ] `--no-xhtml` - [ ] `--include` / `--exclude` - [ ] `--charset` options - [ ] `--magic` options - [ ] `--mapfile`
Possibly hinting at other escaping issues. Example: ``` WARC/1.0 WARC-Date: 2004-11-10T16:15:13Z WARC-Source-URI: file://waste/images/17#.jpg WARC-Created-Date: 2018-02-06T16:26:13Z WARC-Type: resource WARC-Record-ID: WARC-Target-URI: http://heise.de/tp/kunst/waste/images/17#.jpg WARC-Payload-Digest: sha1:GLC3CKKQ4LSVN4FD75TBXBOOAHA6WP6N WARC-Block-Digest: sha1:GLC3CKKQ4LSVN4FD75TBXBOOAHA6WP6N Content-Type: image/jpeg Content-Length: 5222 ``` Should...
pip3 install warcit
Fix for [Issue 31](https://github.com/webrecorder/warcit/issues/31) (hopefully): Changed method: make_index_revisit Information about the WARC-Date of the index record is passed on to the create_revisit_record-method, so that the revisit record gets the same...
Fixes #27 While migrating my own Python install fully to [UV](https://docs.astral.sh/uv/) tonight, I was unable to get warcit installed due to the issue documented in #27. Figured I'd have a...