Nick Sweeting

Results 140 issues of Nick Sweeting

I spent a few hours trying and failing to get this up and running on my machine, eventually gave up due to php plugin installation issues. It would be great...

Instead of this: ```python3 class ArchiveResult: path = field.CharField(...) ArchiveResult(path='./archive/warc/somefile.warc.gz') ``` We should be doing this: ```python3 class ArchiveResult: path = field.FileField(...) ArchiveResult(path=Path('./archive/warc/somefile.warc.gz')) ``` `settings.py`: ```python3 MEDIA_URL = 'archive' MEDIA_ROOT...

status: idea-phase
size: medium
touches: configuration
touches: data/schema/architecture
why: security
touches: dependencies/packaging
why: performance
touches: docs
type: enhancement

This is by far the most requested feature. People want an easy way to take multiple snapshots of websites over time. > Here's how archive.org does it > --- For...

size: hard
status: idea-phase
touches: data/schema/architecture

DONE: - [x] pass `--single-process --no-zygote` args to chrome in Docker to mitigate orphan subprocess accumulation issues (also made it a lot faster as a side-effect!) 49faec8f6 - [x] fix...

Umbrel is a new OS for homelab self-hosting of Dockerized apps. It looks like a perfect fit for ArchiveBox and it's not difficult for us to add the yaml/config necessary...

touches: configuration
size: easy
good first ticket
help wanted
touches: docs
status: backlog
type: enhancement

Right now the `FETCH_WARC` option only creates a simple html file WARC with wget, it doesn't save all the requests made dynamically after JS executes by chrome headless. We should...

size: hard
why: functionality
status: wip

Add a `--parallel=8` cli option to enable using multiprocessing to download a large number of links in parallel. Default to number of cores on machine, allow `--parallel=1` to override it...

size: hard
touches: config
touches: schema/architecture
why: performance
status: backlog

Fixes: #578 **Remaining TODOs:** - [ ] figure out which python scheduler to use - huey (my current favorite) - celery (ugh...) - APScheduler (will require lots of manual models...

size: medium
touches: config
why: functionality
status: wip
touches: schema/architecture
touches: dependencies
touches: docs
type: enhancement

https://github.com/GoogleChrome/puppeteer is fantastic for scripting actions on pages before making a screenshot or PDF. I could add support for custom puppeteer scripts for certain urls that need a user action...

size: hard
why: functionality
status: wip
touches: data/schema/architecture
status: backlog

SingleFile supports a `--browser-height=$HEIGHT` option, we should parse our `DIMENSIONS` config var and get the height, and pass that to SingleFile when archiving to trigger full-page screenshots. This will help...

status: idea-phase
touches: configuration
why: functionality
size: easy
type: enhancement
expected: maybe someday