Alexander Sibiryakov comments

Results 124 comments of


                                            Alexander Sibiryakov

Speedup & fix URL parsing

`count 82257 avg 794.1626245 median 644 90% 1144` for a GURL. So, Yandex one is from 1.25 to 2x faster. May be this is connected with more efficient memory allocation...

Speedup & fix URL parsing

I've got an idea. Let's create a library supporting batch operations on URL parsing. For Scrapy it should be a common use case. Let me know, what you think!

Speedup & fix URL parsing

I've made a wrong conclusion about Yandex parser being 1000 times faster, and updated the comment.

Speedup & fix URL parsing

Batch of URLs as input, and response is vector of results.

Speedup & fix URL parsing

Here is the testing code https://github.com/sibiryakov/balancer/blob/urlbench/tools/urlbench/main.cpp

ModuleNotFoundError: No module named 'frontera.contrib.scrapy.middlewares.seeds'

Hi @liho00 your seeds weren't injected, because the strategy worker was unable to create the table `crawler:queue`. Check that it can connect to Hbase Thrift Server, and namespace `crawler` exists.

ModuleNotFoundError: No module named 'frontera.contrib.scrapy.middlewares.seeds'

@Gallaecio it should be a tiny PR https://github.com/scrapinghub/frontera/issues/371#issuecomment-500197551

How to make scrapy-splash work with frontera?

Hi @DiscipleOfOne the right approach will be to use this guide https://github.com/scrapy-plugins/scrapy-splash#configuration , and Scrapy.Request with `splash` meta key.

sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140553838536448 and this is thread id 140554174347072

Is there any other process using the same sqlite file? strategy worker may be?

I have done some workaround across implementing ActiveMq to the messagebus so would like to contribute

Hey @amitsing89 , I think you should rebase it to latest master, it seems to me your code is based on some outdated version.