Nick Sweeting
Nick Sweeting
I spent a few hours trying and failing to get this up and running on my machine, eventually gave up due to php plugin installation issues. It would be great...
Instead of this: ```python3 class ArchiveResult: path = field.CharField(...) ArchiveResult(path='./archive/warc/somefile.warc.gz') ``` We should be doing this: ```python3 class ArchiveResult: path = field.FileField(...) ArchiveResult(path=Path('./archive/warc/somefile.warc.gz')) ``` `settings.py`: ```python3 MEDIA_URL = 'archive' MEDIA_ROOT...
This is by far the most requested feature. People want an easy way to take multiple snapshots of websites over time. > Here's how archive.org does it > --- For...
DONE: - [x] pass `--single-process --no-zygote` args to chrome in Docker to mitigate orphan subprocess accumulation issues (also made it a lot faster as a side-effect!) 49faec8f6 - [x] fix...
Umbrel is a new OS for homelab self-hosting of Dockerized apps. It looks like a perfect fit for ArchiveBox and it's not difficult for us to add the yaml/config necessary...
Right now the `FETCH_WARC` option only creates a simple html file WARC with wget, it doesn't save all the requests made dynamically after JS executes by chrome headless. We should...
Add a `--parallel=8` cli option to enable using multiprocessing to download a large number of links in parallel. Default to number of cores on machine, allow `--parallel=1` to override it...
Fixes: #578 **Remaining TODOs:** - [ ] figure out which python scheduler to use - huey (my current favorite) - celery (ugh...) - APScheduler (will require lots of manual models...
https://github.com/GoogleChrome/puppeteer is fantastic for scripting actions on pages before making a screenshot or PDF. I could add support for custom puppeteer scripts for certain urls that need a user action...
SingleFile supports a `--browser-height=$HEIGHT` option, we should parse our `DIMENSIONS` config var and get the height, and pass that to SingleFile when archiving to trigger full-page screenshots. This will help...