Adrián Chaves
Adrián Chaves
I think this may need a bit of community discussion. I also worry that it would not be obvious for users of this method how project settings are ignored when...
From your findings, could the header order be the issue? Some antibot software I believe takes that into account.
I don’t think we should enable it by default. But maybe we should document this as one thing to try when getting unexpected responses.
> I tried, but failed, to create a DOWNLOADER_MIDDLEWARES that would use requests.get() to fetch the pages. Has anyone ever done this ? Sounds interesting as a proof of concept....
> I'm pretty sure the difference is in order and/or case of headers. For those we have https://github.com/scrapy/scrapy/issues/2711 and https://github.com/scrapy/scrapy/issues/2803, so if that’s the case we could probably close this...
Can you provide a [self-contained, minimal example](https://stackoverflow.com/help/minimal-reproducible-example) to reproduce the issue?
This is what a minimal example looks like to me: ```python from scrapy.http import TextResponse from scrapy.link import Link from scrapy.linkextractors import LinkExtractor def process_value(url): return url link_extractor = LinkExtractor(...
Ah, I see. So the problem is that `process_value` is called before `allow` is taken into account. I assume there are scenarios where the current behavior is desired. For example,...
I agree that if not modified it should be documented.
I see you have removed the setting now, not sure why. I think it would be best to keep it. It does complicate the implementation, though. When creating an instance...