Mikhail Korobov comments

Results 479 comments of


                                            Mikhail Korobov

Explain how to use scrapy-splash with AutoThrottle

There are ways to make it work with AutoThrottle in a more reasonable way, e.g. https://github.com/TeamHG-Memex/undercrawler/blob/master/undercrawler/middleware/throttle.py. As a first step - yes, it makes sense to at least document this...

Handling multiple window or pop-ups

No, multiple windows and non-js poopups are not supported by Splash at the moment.

Handling multiple window or pop-ups

I just meant that some popups are implemented as `` elements over the webpage, as opposed to opening a new browser window, and you can work fine with these popups.

Bad request to Splash & HTTP status code is not handled or not allowed

Yeah, it can be the problem. It is caused by cache: when response is fetched from an in-memory cache, it doesn't get a record in splash:history. I don't have a...

Middleware settings for scrapy-splash with scrapy-cluster, SplashRequest not work

I think it could be related to dupefilter used by [crawling.distributed_scheduler.DistributedScheduler](https://github.com/istresearch/scrapy-cluster/blob/93a0bd069fe005963b120719c0da9636f24cf289/crawler/crawling/distributed_scheduler.py#L16) - this dupefilter [uses](https://github.com/istresearch/scrapy-cluster/blob/master/crawler/crawling/redis_dupefilter.py#L23) request_fingerprint function which doesn't work correctly for Splash requests. Default dupefilter doesn't take request.meta values...

Middleware settings for scrapy-splash with scrapy-cluster, SplashRequest not work

See also: https://github.com/istresearch/scrapy-cluster/issues/94. I'm not sure how it can be solved in scrapy-splash itself.

Middleware settings for scrapy-splash with scrapy-cluster, SplashRequest not work

Yes, it can't. Currently one have to fork & fix scrapy-cluster to make them work together. An alternative way is to use Splash HTTP API directly, as shown at https://github.com/scrapy-plugins/scrapy-splash#why-not-use-the-splash-http-api-directly;...

Middleware settings for scrapy-splash with scrapy-cluster, SplashRequest not work

@wenxzhen I'm not a scrapy-cluster user myself, but a brief look results are in this comment: https://github.com/scrapy-plugins/scrapy-splash/issues/101#issuecomment-274729809

Migrate scrapy to headless-chrome?

Webkit is upgraded to a much more recent version in Splash master (~mid-2016 Safari), and will be upgraded further (to Webkit trunk) in future, thanks to https://github.com/annulen/webkit. You can use...

Migrate scrapy to headless-chrome?

Could you try it again? It looks like a temporary issue - either a dockerhub issue, or a network issue.