Corentin Barreau

Results 77 comments of Corentin Barreau

Thanks for your contribution @machawk1, to be noted: **WARC writing is async**. The WARC writing queue is displayed when using Zeno with --live-stats, one edge case that is not handled...

I think it should be a setting in Zeno, that turns on a setting in the WARC library.

> I am trying to work on this issue and have successfully setup the codebase locally, just need a little help understanding the codebase and where would I add this...

Hi @Qu-Ack did you have an idea for which lib you would use? I'm looking around and can't find a good one.

> Which commit? 2ac2af599f020af548d67edd9e4d16c03fc41d72

> OCR might be slow and inaccurate, but how about extracting URLs from QR codes in images? Very good idea. (not a priority though, maybe it should be another issue?)

Do we want to have an option to disable this? (in order to save some disk I/O when we know the crawl will be short and we don't care about...

> also can we display the stats using `--live-stats` flag? Nah, I don't think we should add too much stuff to live-stats.. We already added a lot with the recent...

It's strange, it looks like it works, but not for all workers. Right now I have a crawl that is displaying the low disk space pausing message, and 1 worker...