Proxy Errors
If I do a scrape without the -proxies flag, it works as expected.
Also if I use curl and the proxy url to the google maps url then it works as expected.
However when I add the -proxies flag I see these errors in the output and it does not work.
This is with everything installed directly on ubuntu 24.04 ... and not using Docker.
{"level":"info","component":"scrapemate","time":"2024-12-13T17:55:43.067587274Z","message":"starting scrapemate"}
{"level":"info","component":"scrapemate","numOfJobsCompleted":0,"numOfJobsFailed":0,"lastActivityAt":"0001-01-01T00:00:00Z","speed":"0.00 jobs/min","time":"2024-12-13T17:56:43.068652477Z","message":"scrapemate stats"}
{"level":"info","component":"scrapemate","error":"inactivity timeout: 0001-01-01T00:00:00Z","time":"2024-12-13T17:56:43.068682503Z","message":"exiting because of inactivity"}
{"level":"info","component":"scrapemate","job":"Job{ID: 71d978af-8a91-45ee-9d56-dd630099dc31, Method: GET, URL: https://www.google.com/maps/search/test, UrlParams: map[hl:en]}","error":"context canceled","status":"failed","duration":60899.47342,"time":"2024-12-13T17:56:43.967269068Z","message":"job finished"}
{"level":"error","component":"scrapemate","error":"context canceled","time":"2024-12-13T17:56:43.967304912Z","message":"error while processing job"}
{"level":"info","component":"scrapemate","time":"2024-12-13T17:56:43.967340977Z","message":"scrapemate exited"}
Any tips on what I could be doing wrong?
Update1: I tested a 2nd proxy service and I also tried setting the proxy with export http_proxy="proxy url here" and export https_proxy="proxy url here" while not using the -proxies flag ... but it still didn't work.
Update 2: I tested a 3rd proxy service using socks5 and the -proxies flag and got this error {"level":"error","component":"scrapemate","error":"playwright: Browser does not support socks5 proxy authentication","time":"2024-12-14T22:00:49.630242703Z","message":"error while processing job"}
I just tested the functionality using a socks5 proxy:
ssh -D 1080 -q -C -N myserver
and then used this:
go run main.go -input example-queries.txt -results demo.csv -proxies 'socks5://127.0.0.1:1080'
also tried from the web interface.
The traffic went through the proxy.
The difference here is that I use a socks5 proxy without authentication.
I don't have access to a proxy with authentication at the moment.
I have tried the http proxy with authentication when this was implemented and worked.
@jj2018jj this is confirmed . I believe that something may have changed in playwright.
I will investigate and get back to you
@gosom Any update on this?
Also getting this error when trying to use HTTPS proxies:
{"level":"info","component":"scrapemate","time":"2025-01-09T08:41:10.0880056Z","message":"starting scrapemate"} {"level":"info","component":"scrapemate","numOfJobsCompleted":0,"numOfJobsFailed":0,"lastActivityAt":"0001-01-01T00:00:00Z","speed":"0.00 jobs/min","time":"2025-01-09T08:42:10.0895125Z","message":"scrapemate stats"} {"level":"info","component":"scrapemate","error":"inactivity timeout: 0001-01-01T00:00:00Z","time":"2025-01-09T08:42:10.0895125Z","message":"exiting because of inactivity"} {"level":"info","component":"scrapemate","job":"Job{ID: e8d7abd1-4a27-4453-99f8-a3a640ff5daa, Method: GET, URL: https://www.google.com/maps/search/roofing+in+Worcester+Park/@0,0,15z, UrlParams: map[hl:en]}","error":"context canceled","status":"failed","duration":62254.0603,"time":"2025-01-09T08:42:12.343207Z","message":"job finished"} {"level":"error","component":"scrapemate","error":"context canceled","time":"2025-01-09T08:42:12.343207Z","message":"error while processing job"} {"level":"info","component":"scrapemate","time":"2025-01-09T08:42:12.343207Z","message":"scrapemate exited"}
@gosom Is there an update yet? I have a client who needs this really badly and I have to tell him whether it will be fixed or how long it takes
@gosom I'm having issues with a rotating proxy also. No matter how I set up the proxy (using proxy domain or proxy IP in the appropriate format) I get the error:
{"level":"info","component":"scrapemate","time":"2025-01-28T15:00:43.445946977Z","message":"starting scrapemate"} {"level":"info","component":"scrapemate","numOfJobsCompleted":0,"numOfJobsFailed":0,"lastActivityAt":"0001-01-01T00:00:00Z","speed":"0.00 jobs/min","time":"2025-01-28T15:01:43.444493455Z","message":"scrapemate stats"} {"level":"info","component":"scrapemate","error":"inactivity timeout: 0001-01-01T00:00:00Z","time":"2025-01-28T15:01:43.444538295Z","message":"exiting because of inactivity"} {"level":"info","component":"scrapemate","job":"Job{ID: f5fbfcc5-40eb-4bc6-84e6-6c4977e2db1e, Method: GET, URL: https://www.google.com/maps/search/OBFUSCATED+OBFUSCATED+OBFUSCATED, UrlParams: map[hl:en]}","error":"context canceled","status":"failed","duration":90804.082496,"time":"2025-01-28T15:02:14.252046946Z","message":"job finished"} {"level":"error","component":"scrapemate","error":"context canceled","time":"2025-01-28T15:02:14.25207078Z","message":"error while processing job"} {"level":"info","component":"scrapemate","time":"2025-01-28T15:02:14.252090701Z","message":"scrapemate exited"}
...or similar. What happens is, the proxy does not load, and the script hangs until it exits due to inactivity.
I tested using 2 commercial proxies and none of them seemed to work. However, when I created a custom HTTP proxy in a VPS and a socks5 proxy then it worked.
It looks like the issue is a combination of playwright and commercial proxies like an anti-bot protection or similar
may I ask how one could run this project for a bigger workload in the current situation. arent proxies a hard requirement or am I not aware of something?
may I ask how one could run this project for a bigger workload in the current situation. arent proxies a hard requirement or am I not aware of something?
- Proxies are not a hard requirement.
- You can route the traffic from the docker container via a VPN if you like
I'm encountering the same issue with the proxy setup. Without proxies, scraping quickly leads to frequent timeouts. After configuring Evomi proxies, Evomi's dashboard indicates that a tiny amount of data is indeed being consumed, but the scraper itself produces no results or simply stalls indefinitely.
Unfortunately, this makes the project unusable for my case.
@pandinug use proxy without authentication(username, pass) are work. I turn off the authentication, so it worked. Use Ip auth instead
@khanhmb which proxies do you use?
@BjoernRave i use http proxy, just bought in my country, like (M2 Proxy, Zing Proxy....)
Using proxy without auth works. You just need to whitelist your IP from your proxy provider.