proxyscrape icon indicating copy to clipboard operation
proxyscrape copied to clipboard

Optimize retrieving proxies from store

Open JaredLGillespie opened this issue 6 years ago • 0 comments

When retrieving a proxy, the following occurs:

  1. If refresh is needed, scrape new proxies from sources
  2. Filter out each proxy from blacklist
  3. Pick a random proxy and return it

If a refresh doesn't occur, and the blacklist doesn't change, then step 1 + 2 should be skipped. Also, we can use inverted indexes for filtering based on anonymity, country code, etc. Then performing intersections (or unions) on the sets provided by the inverted indexes should yield the pool of applicable proxies. This should drastically improve performance when a large number of proxies are retrieved.

Also, please add methods for retrieving the number of proxies currently stored. Method should optionally take a filter for retrieving the number of proxies that this matches.

JaredLGillespie avatar Nov 09 '19 19:11 JaredLGillespie