crawlee
crawlee copied to clipboard
Smart proxy configuration rotator
Which package is the feature request for? If unsure which one to select, leave blank
No response
Feature
Currently, we only allow single ProxyConfiguration per Crawler. This is fine for most cases but sometimes you want to be little smarter than that. A typical use case is that you have:
- limited pool of datacenter proxies with cheap traffic
- unlimited pool of residential proxies with expensive traffic
- possibly also 3rd party API that might be affordable but not that stable
Instead of manually figuring out the correct setting for each run or even reworking the proxy configuration in the middle of the run, you want to handle it dynamically. You want to start with the cheapest proxy groups and only fallback to more expensive solutions if the cheaper ones start performing poorly. There might be even a use-case for having different logic per route but I think that is fairly unique.
cc @petrpatek @AndreyBykov
Motivation
As described above
Ideal solution or implementation, and any additional constraints
The solution needs to have minimal 2 parts:
- Definition of default priorities. You want some proxy type to be preferred given the same performance.
- Dynamic scoring. You want to update score for each configuration based on its (default or user-defined) performance. Usually succeed or failed requests. The algorithm should also allow to go back to prioritized proxy after discarding it previously, e.g. it should start polling it with fewer requests and see if it might be unblocked again.
There are several current implementations either using:
- BasicCrawler - You just instantiate a class and call like
proxyRotator.getBestProxyConfig(request.url)
- Local proxy chain server - All traffic is routed through a custom local super proxy server that can then choose their proxy config.
I can send example implementations to you personally.
Alternative solutions or implementations
No response
Other context
No response
IMO this should be done on the platform as a feature of Apify Proxy.
@mnmkng While I'm all for smart proxy on Apify, I would not want to kill this feature because on the platform it will be only as a man-in-the-middle proxy. This feature could be a general feature for the broader community.
Closing as we have the proxy tiers support, feel free to reopen if you think there is still something missing.