warcworker icon indicating copy to clipboard operation
warcworker copied to clipboard

A dockerized, queued high fidelity web archiver based on Squidwarc

Results 7 warcworker issues
Sort by recently updated
recently updated
newest added

Hi, I'm exploring tools for crawling social media. I got a FileNotFoundError after starting a crawl. I chose scroll_everything as script. ``` FileNotFoundError FileNotFoundError: [Errno 2] No such file or...

* How do I pull an entire website with this * How do I see what it is doing internally?

![image](https://user-images.githubusercontent.com/19284/43676896-e4c3276e-97f9-11e8-815c-0ab5c1cc254f.png)

The screenshot is now saved in the root archive folder. It would be great to have them saved in the job dir instead.

One of the use cases I have wanted to support in Squidwarc is multiple worker crawlers populating and pulling from a single master frontier. As well as a move from...

When selecting which user scripts to run, make it possible to configure the order.

Currently the worker is using Python 3.6 compiled from source. It could probably just as well use the bundled javascript facilities from the base image to work on queue items....