Corentin Barreau
Corentin Barreau
Thanks for your contribution @machawk1, to be noted: **WARC writing is async**. The WARC writing queue is displayed when using Zeno with --live-stats, one edge case that is not handled...
Thanks! Really sorry for the delay here..
I think it should be a setting in Zeno, that turns on a setting in the WARC library.
> I am trying to work on this issue and have successfully setup the codebase locally, just need a little help understanding the codebase and where would I add this...
Hi @Qu-Ack did you have an idea for which lib you would use? I'm looking around and can't find a good one.
> Which commit? 2ac2af599f020af548d67edd9e4d16c03fc41d72
> OCR might be slow and inaccurate, but how about extracting URLs from QR codes in images? Very good idea. (not a priority though, maybe it should be another issue?)
Do we want to have an option to disable this? (in order to save some disk I/O when we know the crawl will be short and we don't care about...
> also can we display the stats using `--live-stats` flag? Nah, I don't think we should add too much stuff to live-stats.. We already added a lot with the recent...
It's strange, it looks like it works, but not for all workers. Right now I have a crawl that is displaying the low disk space pausing message, and 1 worker...