scrapoxy
scrapoxy copied to clipboard
Documentation incorrectly states that any software accepting CONNECT method could be used as a proxy
Hello,
I was trying to build by own image with a 3rd party HTTP proxy.
Expected Behavior
According to the documentation:
you can use every software which accept the CONNECT method (Squid, Tinyproxy, etc.).
Actual Behavior
This is not the case because Scrapoxy expects to receive 200 response on http://xx.xx.xx.xx:3128/ping endpoint which no other proxy software implements. Scrapoxy thus considers the instance to be dead if you simply install Tinyproxy / Squid etc.
The problem with built-in NodeJS proxy is that it does not support auth - and I want to avoid hosting an open proxy.
And, despite what the documentation claims, I cannot simply install Tinyproxy because it is unaware that it needs to respond to /ping
requests. The only solution I can think of (other than extending existing NodeJS proxy with auth) is the put Tinyproxy behind e.g. a reverse proxy which would respond to /ping
request and relay all other requests / responses to Tinyproxy listening internally on a different port. This is obviously far from ideal, and at the very least documentation should be corrected to reflect that.
Thanks,
If anyone is interested, I have patched Tinyproxy to implement HTTP ping response required by Scrapoxy, so you can install it by building from source from nirvana-msu/tinyproxy, scrapoxy branch.
Coupled with these fixes for #171 and #172, as well as a proper config to set tag on AWS instances, I now have a properly working setup where proxy instances correctly validate Basic Auth.
I would say the documentation regarding the gotchas when using 3rd party proxy software still needs to be updated. And it should also be made clear in the doc that built-in proxy instances do not validate authorization header (and are thus open proxies available for anyone to use).
P.S. I've also used the (currently cheapest) t3a.nano
AWS EC2 instance to build my image and so far it's all working perfectly fine.
Hi, setuped your branch and compiled tinyproxy - now i started scrapoxy with
node server/index.js start conf.json
(used node 14)
but when trying http urls there is this error:
2020-09-07T21:02:46.443Z - error: [Master] Error: request error from target (GET http://www.google.com/ on instance i-0390d07fcf94af17b@xxx:3128): message=Parse Error: Expected HTTP/, stack=Error: Parse Error: Expected HTTP/ at Socket.socketOnData (_http_client.js:509:22) at Socket.emit (events.js:314:20) at Socket.EventEmitter.emit (domain.js:486:12) at addChunk (_stream_readable.js:303:12) at readableAddChunk (_stream_readable.js:279:9) at Socket.Readable.push (_stream_readable.js:218:10)
and for https urls:
2020-09-07T21:08:44.048Z - error: [Master] Error: socket error from client (CONNECT www.google.com:443 on instance i-057e068f4c4fb39aa@xxx:3128): message=read ECONNRESET, stack=Error: read ECONNRESET at TCP.onStreamRead (internal/stream_base_commons.js:209:20), errno=-104, code=ECONNRESET, syscall=read
which node version did you use?
corrected with 4.0.0 (all traffic is now encrypted with dedicated TLS certificates between master and proxies)
Hey there! π Exciting news! Scrapoxy 4 is ready to rock π. Check it out at Scrapoxy.io (explore the "get started" guide, deployment documentation, and more π). I can't wait to hear your feedback on this new version! Send me your coolest screenshots with as many proxies as possible! πΈπ» Join the Discord community if you have any questions or just want to chat. You can also open a GitHub issue for any bug or feature request πβ¨. See you soon! π Fabien