Mikhail Korobov
Mikhail Korobov
There are ways to make it work with AutoThrottle in a more reasonable way, e.g. https://github.com/TeamHG-Memex/undercrawler/blob/master/undercrawler/middleware/throttle.py. As a first step - yes, it makes sense to at least document this...
No, multiple windows and non-js poopups are not supported by Splash at the moment.
I just meant that some popups are implemented as `` elements over the webpage, as opposed to opening a new browser window, and you can work fine with these popups.
Yeah, it can be the problem. It is caused by cache: when response is fetched from an in-memory cache, it doesn't get a record in splash:history. I don't have a...
I think it could be related to dupefilter used by [crawling.distributed_scheduler.DistributedScheduler](https://github.com/istresearch/scrapy-cluster/blob/93a0bd069fe005963b120719c0da9636f24cf289/crawler/crawling/distributed_scheduler.py#L16) - this dupefilter [uses](https://github.com/istresearch/scrapy-cluster/blob/master/crawler/crawling/redis_dupefilter.py#L23) request_fingerprint function which doesn't work correctly for Splash requests. Default dupefilter doesn't take request.meta values...
See also: https://github.com/istresearch/scrapy-cluster/issues/94. I'm not sure how it can be solved in scrapy-splash itself.
Yes, it can't. Currently one have to fork & fix scrapy-cluster to make them work together. An alternative way is to use Splash HTTP API directly, as shown at https://github.com/scrapy-plugins/scrapy-splash#why-not-use-the-splash-http-api-directly;...
@wenxzhen I'm not a scrapy-cluster user myself, but a brief look results are in this comment: https://github.com/scrapy-plugins/scrapy-splash/issues/101#issuecomment-274729809
Webkit is upgraded to a much more recent version in Splash master (~mid-2016 Safari), and will be upgraded further (to Webkit trunk) in future, thanks to https://github.com/annulen/webkit. You can use...
Could you try it again? It looks like a temporary issue - either a dockerhub issue, or a network issue.