Alexander Sibiryakov comments

Results 124 comments of


                                            Alexander Sibiryakov

Communicating with HBase 0.98.x

The recent ones: HBase 1.x.x

Communicating with HBase 0.98.x

@nautilus28 in your previous message the error is quite clear - you have to create namespace. RTFM http://frontera.readthedocs.io/en/latest/topics/production-broad-crawling.html#production-broad-crawling

While using sqlalchemy backend 301 redirect remained as QUEUED

Yes, that's confusing. At the same time it depends on canonical solver behavior. So I think, the right way is to mark as CRAWLED url considered as canonical, and leave...

Queued remain as queued when you stop crawling in sqlalchemy backend.

Good finding again! I'm not sure that will help to completely avoid `queued` status, when it's not actually queued. Spider process can be killed, so everything in queue will be...

process spider level exceptions fix #63

This could work, but would be nice to pass: - Error type: where it came from, e.g. downloader or spider? - in case of spider, response object.

process spider level exceptions fix #63

anyone here to estimate consequences of that?

More efficient memory backends

IMO, it's a premature optimization. Most of the time people use memory backend just to check that scrapy+frontera+spider mix is working, and to debug issues in other backends. IMO, no...

More efficient memory backends

I rewrote memory LIFO and FIFO using `deque` in https://github.com/scrapinghub/frontera/pull/81

More efficient memory backends

any ideas what can be more efficient for DFS and BFS than heapq?

More efficient memory backends

@wilfre DFS stands for Depth-first, and BFS is Breadth-first. If you could imagine a link graph, than DFS is crawls links having the biggest distance from root (seeds in our...