whoogle-search icon indicating copy to clipboard operation
whoogle-search copied to clipboard

[FEATURE] Multiple Proxies

Open kitchenutensils778 opened this issue 3 years ago • 9 comments

Is is possible to add multiple proxies and select the least latency proxy in real time?

kitchenutensils778 avatar Aug 05 '21 17:08 kitchenutensils778

@benbusby I would like to give this a try. However, I would need guidance on it 😅. I don't know, maybe some resources to refer too?

vacom13 avatar Dec 18 '21 10:12 vacom13

@benbusby I will take it up

vacom13 avatar Feb 04 '22 05:02 vacom13

Thanks @vacom13, and sorry I missed your message from back in December! I think for an initial implementation you could just add support for multiple proxies, and retry requests using a different proxy (if multiple are configured) if one times out. Selecting the least latency proxy can be added later as an improvement.

benbusby avatar Feb 04 '22 23:02 benbusby

@benbusby no problem. And yes I will look into it

vacom13 avatar Feb 08 '22 05:02 vacom13

@benbusby Hey. I have actually been really busy with family and college work. I did think up a way to implement this I suppose. Correct me if I am wrong. For now I should probably just get a list of working proxies from a site and give the user a checkbox in the configs to select if he wants to use the proxies right? And them I need to make sure that the search results come within the given timeframe. After that I can give the user the ability to select the latency right?

vacom13 avatar Mar 17 '22 10:03 vacom13

@benbusby there is this python library free-proxies. It basically scrapes for proxies on https://www.sslproxies.org/ and gives a string for working proxies. I could use that and incase it times out, I could try to get another proxy. As it gets the proxy in real time, I suppose there shouldnt be a problem as according to the documentation, it checks whether the proxy is working.

vacom13 avatar Mar 22 '22 12:03 vacom13

Or I could just scrape the list myself for the first 5 proxies and then check those out?

vacom13 avatar Mar 22 '22 12:03 vacom13

Hey @vacom13, I think that's actually a good idea, but potentially a bit out of scope for this issue. I think all this issue should really support is multiple user specified proxies. So if a user has access to multiple, they can specify them as a comma separated string (or something along those lines).

So currently WHOOGLE_PROXY_LOC only accepts one IP:PORT string, but could be updated to have multiple and cycle through them if one of them returns an error response code. If a user just specified the proxy locations as a comma separated string such as IP:4000,IP:4001, then in the request module we could do something like:

proxy_paths = os.environ.get('WHOOGLE_PROXY_LOC', '').split(',')
if proxy_paths:
    # ...
    for path in proxy_paths:
        # Validate a 200 response and no captcha from search URL

I think your idea could be an entirely separate issue, but I'd like to personally look into the free-proxies library a bit first.

benbusby avatar Mar 22 '22 15:03 benbusby

@benbusby i have tried to work on this but it's just that not having multiple working proxies available just makes it confusing. I used a couple of free proxies but i ran into internal errors maybe because the connection keeps timing out. The free proxies did work in a test script i created but whenever i add the proxy to the whoogle env and then run, it always leads to some problem. I did also run into a rate limiting issue with a proxy. Anyway, I will keep at it.

vacom13 avatar May 01 '22 23:05 vacom13