Alexander Sibiryakov
Alexander Sibiryakov
`count 82257 avg 794.1626245 median 644 90% 1144` for a GURL. So, Yandex one is from 1.25 to 2x faster. May be this is connected with more efficient memory allocation...
I've got an idea. Let's create a library supporting batch operations on URL parsing. For Scrapy it should be a common use case. Let me know, what you think!
I've made a wrong conclusion about Yandex parser being 1000 times faster, and updated the comment.
Batch of URLs as input, and response is vector of results.
Here is the testing code https://github.com/sibiryakov/balancer/blob/urlbench/tools/urlbench/main.cpp
Hi @liho00 your seeds weren't injected, because the strategy worker was unable to create the table `crawler:queue`. Check that it can connect to Hbase Thrift Server, and namespace `crawler` exists.
@Gallaecio it should be a tiny PR https://github.com/scrapinghub/frontera/issues/371#issuecomment-500197551
Hi @DiscipleOfOne the right approach will be to use this guide https://github.com/scrapy-plugins/scrapy-splash#configuration , and Scrapy.Request with `splash` meta key.
Is there any other process using the same sqlite file? strategy worker may be?
Hey @amitsing89 , I think you should rebase it to latest master, it seems to me your code is based on some outdated version.