Mikhail Korobov comments

Results 479 comments of


                                            Mikhail Korobov

selector create_root_node memory issues

> As far as I understand - result of this function is used to stop spiders on scrapy cloud - this is main reason of usage of this method for...

What do you think about Selector(response).xpath().map() ?

I'm also on fence; let's say -0 to add such shortcut, as it doesn't add much, and it is not composable. To make processing code shorter and more composable I'd...

Add option to retrieve text content

Hey @frederik-elwert! This is being worked on here: https://github.com/scrapy/parsel/pull/127 :)

Add option to retrieve text content

Not much, but I've merged master to #127 yesterday, so the PR is up-to-date now. I think feature-wise it is ready; I'm happy with the implementation. But it needs some...

[Feature Request] Add support for JMESPath

There are useful use cases for chaining (e.g. processing data- attributes), but I think they don't worth extra complexity we may introduce to support them. `response.jmespath(...)` or `jmespath.search(...)` covers most...

[Feature Request] Add support for JMESPath

I agree that option (1) looks easy enough to implement, but have anyone had a real use case for it? If I understood @voith properly, he wanted to parse JSON...

Add example integration with pyrebloom

A dupefilter based on a bloom filter can be dangerous because some requests may be incorrectly dropped: a bloom filter can only be 100% trusted when it says the request...

Add example integration with pyrebloom

@rafaelcapucho - request is not seen -> process it and add to seen requests. This always works properly, so there are no requests processed more than 1 time. - request...

Add example integration with pyrebloom

@rafaelcapucho I mean that Scrapy asks dupefilter a question: "is this request seen?". There are two possible answers: - yep, request is seen; - nope, request is not seen. When...

Add an option to send requests to Splash by default

I like it, but I'm not sure this ``` yield Request(url, self.parse_result, meta={'splash': True}) ``` is better than this: ``` yield Request(url, self.parse_result, meta={'splash': self.splash_options}) ```