Alexander Sibiryakov comments

Results 124 comments of


                                            Alexander Sibiryakov

More efficient memory backends

@wilfre Thanks for your suggestion, it looks interesting. Two points: 1. we don't have a link graph until the documents are crawled, therefore G will be updated on every iteration...

Request count in Hbase backend

There are stats in DB worker and SW worker. You could try to see if these stats are available by means of JSONRPC. There are such things like batches after...

WIP: Check if table already exists before creating it

@lopuhin concurrent table creation is something we should avoid doing. Such a behavior isn't expected by both DB servers and clients. I recommend to redesign your application to avoid doing...

WIP: Check if table already exists before creating it

I would like to test that behavior. What if we'll be testing our backends with `SQLALCHEMYBACKEND_DROP_ALL_TABLES` enabled?

Check if travis has updated its images, to remove extra code added in tox.ini in PR

what about this @voith ? could you check if this was happen?

frontera is being polite to my splash server. How to disable that?

Please send your Scrapy spider settings. It's hard to guess how you generate Splash requests. I would expect Scrapy is generating robots.txt and it's managed with http://doc.scrapy.org/en/latest/topics/settings.html#robotstxt-obey

frontera is being polite to my splash server. How to disable that?

what do you call frontier rules?

setting to switch off exception when encountering same url fingerprint

@RajatGoyal are you trying to do that in the same process? Or few different processes using the same database? Actually, Frontera wasn't tested in both of these configurations, and do...

setting to switch off exception when encountering same url fingerprint

@RajatGoyal Please tell more about overall problem you're trying to solve, so I be able to suggest a better architecture.

setting to switch off exception when encountering same url fingerprint

SQLiteBackend isn't design for parallel access from different processes. During intensive writes there's a high probability that writing process will have outdated state and based on that will try to...