proxyscrape
proxyscrape copied to clipboard
Optimize retrieving proxies from store
When retrieving a proxy, the following occurs:
- If refresh is needed, scrape new proxies from sources
- Filter out each proxy from blacklist
- Pick a random proxy and return it
If a refresh doesn't occur, and the blacklist doesn't change, then step 1 + 2 should be skipped. Also, we can use inverted indexes for filtering based on anonymity, country code, etc. Then performing intersections (or unions) on the sets provided by the inverted indexes should yield the pool of applicable proxies. This should drastically improve performance when a large number of proxies are retrieved.
Also, please add methods for retrieving the number of proxies currently stored. Method should optionally take a filter for retrieving the number of proxies that this matches.