crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

Smart proxy configuration rotator

Open metalwarrior665 opened this issue 2 years ago • 2 comments

Which package is the feature request for? If unsure which one to select, leave blank

No response

Feature

Currently, we only allow single ProxyConfiguration per Crawler. This is fine for most cases but sometimes you want to be little smarter than that. A typical use case is that you have:

  • limited pool of datacenter proxies with cheap traffic
  • unlimited pool of residential proxies with expensive traffic
  • possibly also 3rd party API that might be affordable but not that stable

Instead of manually figuring out the correct setting for each run or even reworking the proxy configuration in the middle of the run, you want to handle it dynamically. You want to start with the cheapest proxy groups and only fallback to more expensive solutions if the cheaper ones start performing poorly. There might be even a use-case for having different logic per route but I think that is fairly unique.

cc @petrpatek @AndreyBykov

Motivation

As described above

Ideal solution or implementation, and any additional constraints

The solution needs to have minimal 2 parts:

  1. Definition of default priorities. You want some proxy type to be preferred given the same performance.
  2. Dynamic scoring. You want to update score for each configuration based on its (default or user-defined) performance. Usually succeed or failed requests. The algorithm should also allow to go back to prioritized proxy after discarding it previously, e.g. it should start polling it with fewer requests and see if it might be unblocked again.

There are several current implementations either using:

  1. BasicCrawler - You just instantiate a class and call like proxyRotator.getBestProxyConfig(request.url)
  2. Local proxy chain server - All traffic is routed through a custom local super proxy server that can then choose their proxy config.

I can send example implementations to you personally.

Alternative solutions or implementations

No response

Other context

No response

metalwarrior665 avatar Nov 09 '22 14:11 metalwarrior665

IMO this should be done on the platform as a feature of Apify Proxy.

mnmkng avatar Feb 01 '23 19:02 mnmkng

@mnmkng While I'm all for smart proxy on Apify, I would not want to kill this feature because on the platform it will be only as a man-in-the-middle proxy. This feature could be a general feature for the broader community.

metalwarrior665 avatar Feb 01 '23 20:02 metalwarrior665

Closing as we have the proxy tiers support, feel free to reopen if you think there is still something missing.

B4nan avatar Jun 11 '24 14:06 B4nan