Alexander Sibiryakov
Alexander Sibiryakov
The recent ones: HBase 1.x.x
@nautilus28 in your previous message the error is quite clear - you have to create namespace. RTFM http://frontera.readthedocs.io/en/latest/topics/production-broad-crawling.html#production-broad-crawling
Yes, that's confusing. At the same time it depends on canonical solver behavior. So I think, the right way is to mark as CRAWLED url considered as canonical, and leave...
Good finding again! I'm not sure that will help to completely avoid `queued` status, when it's not actually queued. Spider process can be killed, so everything in queue will be...
This could work, but would be nice to pass: - Error type: where it came from, e.g. downloader or spider? - in case of spider, response object.
anyone here to estimate consequences of that?
IMO, it's a premature optimization. Most of the time people use memory backend just to check that scrapy+frontera+spider mix is working, and to debug issues in other backends. IMO, no...
I rewrote memory LIFO and FIFO using `deque` in https://github.com/scrapinghub/frontera/pull/81
any ideas what can be more efficient for DFS and BFS than heapq?
@wilfre DFS stands for Depth-first, and BFS is Breadth-first. If you could imagine a link graph, than DFS is crawls links having the biggest distance from root (seeds in our...