google-maps-scraper icon indicating copy to clipboard operation
google-maps-scraper copied to clipboard

Proxy Errors

Open jj2018jj opened this issue 1 year ago • 13 comments

If I do a scrape without the -proxies flag, it works as expected.

Also if I use curl and the proxy url to the google maps url then it works as expected.

However when I add the -proxies flag I see these errors in the output and it does not work.

This is with everything installed directly on ubuntu 24.04 ... and not using Docker.

{"level":"info","component":"scrapemate","time":"2024-12-13T17:55:43.067587274Z","message":"starting scrapemate"}
{"level":"info","component":"scrapemate","numOfJobsCompleted":0,"numOfJobsFailed":0,"lastActivityAt":"0001-01-01T00:00:00Z","speed":"0.00 jobs/min","time":"2024-12-13T17:56:43.068652477Z","message":"scrapemate stats"}
{"level":"info","component":"scrapemate","error":"inactivity timeout: 0001-01-01T00:00:00Z","time":"2024-12-13T17:56:43.068682503Z","message":"exiting because of inactivity"}
{"level":"info","component":"scrapemate","job":"Job{ID: 71d978af-8a91-45ee-9d56-dd630099dc31, Method: GET, URL: https://www.google.com/maps/search/test, UrlParams: map[hl:en]}","error":"context canceled","status":"failed","duration":60899.47342,"time":"2024-12-13T17:56:43.967269068Z","message":"job finished"}
{"level":"error","component":"scrapemate","error":"context canceled","time":"2024-12-13T17:56:43.967304912Z","message":"error while processing job"}
{"level":"info","component":"scrapemate","time":"2024-12-13T17:56:43.967340977Z","message":"scrapemate exited"}

Any tips on what I could be doing wrong?

Update1: I tested a 2nd proxy service and I also tried setting the proxy with export http_proxy="proxy url here" and export https_proxy="proxy url here" while not using the -proxies flag ... but it still didn't work.

Update 2: I tested a 3rd proxy service using socks5 and the -proxies flag and got this error {"level":"error","component":"scrapemate","error":"playwright: Browser does not support socks5 proxy authentication","time":"2024-12-14T22:00:49.630242703Z","message":"error while processing job"}

jj2018jj avatar Dec 13 '24 18:12 jj2018jj

I just tested the functionality using a socks5 proxy:

ssh -D 1080 -q -C -N myserver

and then used this:

 go run main.go -input example-queries.txt -results demo.csv  -proxies 'socks5://127.0.0.1:1080'

also tried from the web interface.

The traffic went through the proxy.

The difference here is that I use a socks5 proxy without authentication.

I don't have access to a proxy with authentication at the moment.

I have tried the http proxy with authentication when this was implemented and worked.

gosom avatar Dec 15 '24 11:12 gosom

@jj2018jj this is confirmed . I believe that something may have changed in playwright.

I will investigate and get back to you

gosom avatar Dec 16 '24 22:12 gosom

@gosom Any update on this?

Also getting this error when trying to use HTTPS proxies:

{"level":"info","component":"scrapemate","time":"2025-01-09T08:41:10.0880056Z","message":"starting scrapemate"} {"level":"info","component":"scrapemate","numOfJobsCompleted":0,"numOfJobsFailed":0,"lastActivityAt":"0001-01-01T00:00:00Z","speed":"0.00 jobs/min","time":"2025-01-09T08:42:10.0895125Z","message":"scrapemate stats"} {"level":"info","component":"scrapemate","error":"inactivity timeout: 0001-01-01T00:00:00Z","time":"2025-01-09T08:42:10.0895125Z","message":"exiting because of inactivity"} {"level":"info","component":"scrapemate","job":"Job{ID: e8d7abd1-4a27-4453-99f8-a3a640ff5daa, Method: GET, URL: https://www.google.com/maps/search/roofing+in+Worcester+Park/@0,0,15z, UrlParams: map[hl:en]}","error":"context canceled","status":"failed","duration":62254.0603,"time":"2025-01-09T08:42:12.343207Z","message":"job finished"} {"level":"error","component":"scrapemate","error":"context canceled","time":"2025-01-09T08:42:12.343207Z","message":"error while processing job"} {"level":"info","component":"scrapemate","time":"2025-01-09T08:42:12.343207Z","message":"scrapemate exited"}

EdwinUK avatar Jan 09 '25 08:01 EdwinUK

@gosom Is there an update yet? I have a client who needs this really badly and I have to tell him whether it will be fixed or how long it takes

robinroloff avatar Jan 17 '25 16:01 robinroloff

@gosom I'm having issues with a rotating proxy also. No matter how I set up the proxy (using proxy domain or proxy IP in the appropriate format) I get the error:

{"level":"info","component":"scrapemate","time":"2025-01-28T15:00:43.445946977Z","message":"starting scrapemate"} {"level":"info","component":"scrapemate","numOfJobsCompleted":0,"numOfJobsFailed":0,"lastActivityAt":"0001-01-01T00:00:00Z","speed":"0.00 jobs/min","time":"2025-01-28T15:01:43.444493455Z","message":"scrapemate stats"} {"level":"info","component":"scrapemate","error":"inactivity timeout: 0001-01-01T00:00:00Z","time":"2025-01-28T15:01:43.444538295Z","message":"exiting because of inactivity"} {"level":"info","component":"scrapemate","job":"Job{ID: f5fbfcc5-40eb-4bc6-84e6-6c4977e2db1e, Method: GET, URL: https://www.google.com/maps/search/OBFUSCATED+OBFUSCATED+OBFUSCATED, UrlParams: map[hl:en]}","error":"context canceled","status":"failed","duration":90804.082496,"time":"2025-01-28T15:02:14.252046946Z","message":"job finished"} {"level":"error","component":"scrapemate","error":"context canceled","time":"2025-01-28T15:02:14.25207078Z","message":"error while processing job"} {"level":"info","component":"scrapemate","time":"2025-01-28T15:02:14.252090701Z","message":"scrapemate exited"}

...or similar. What happens is, the proxy does not load, and the script hangs until it exits due to inactivity.

SparksIRL avatar Jan 29 '25 09:01 SparksIRL

I tested using 2 commercial proxies and none of them seemed to work. However, when I created a custom HTTP proxy in a VPS and a socks5 proxy then it worked.

It looks like the issue is a combination of playwright and commercial proxies like an anti-bot protection or similar

gosom avatar Feb 08 '25 16:02 gosom

may I ask how one could run this project for a bigger workload in the current situation. arent proxies a hard requirement or am I not aware of something?

BjoernRave avatar Feb 08 '25 22:02 BjoernRave

may I ask how one could run this project for a bigger workload in the current situation. arent proxies a hard requirement or am I not aware of something?

  1. Proxies are not a hard requirement.
  2. You can route the traffic from the docker container via a VPN if you like

gosom avatar Feb 09 '25 09:02 gosom

I'm encountering the same issue with the proxy setup. Without proxies, scraping quickly leads to frequent timeouts. After configuring Evomi proxies, Evomi's dashboard indicates that a tiny amount of data is indeed being consumed, but the scraper itself produces no results or simply stalls indefinitely.

Unfortunately, this makes the project unusable for my case.

pandinug avatar Mar 18 '25 13:03 pandinug

@pandinug use proxy without authentication(username, pass) are work. I turn off the authentication, so it worked. Use Ip auth instead

khanhmb avatar Mar 25 '25 10:03 khanhmb

@khanhmb which proxies do you use?

BjoernRave avatar Apr 07 '25 19:04 BjoernRave

@BjoernRave i use http proxy, just bought in my country, like (M2 Proxy, Zing Proxy....)

khanhmb avatar Apr 08 '25 03:04 khanhmb

Using proxy without auth works. You just need to whitelist your IP from your proxy provider.

raymundedgar avatar May 31 '25 11:05 raymundedgar