scrapoxy Documentation incorrectly states that any software accepting CONNECT method could be used as a proxy

Hello,

I was trying to build by own image with a 3rd party HTTP proxy.

Expected Behavior

you can use every software which accept the CONNECT method (Squid, Tinyproxy, etc.).

Actual Behavior

This is not the case because Scrapoxy expects to receive 200 response on http://xx.xx.xx.xx:3128/ping endpoint which no other proxy software implements. Scrapoxy thus considers the instance to be dead if you simply install Tinyproxy / Squid etc.

The problem with built-in NodeJS proxy is that it does not support auth - and I want to avoid hosting an open proxy.

And, despite what the documentation claims, I cannot simply install Tinyproxy because it is unaware that it needs to respond to /ping requests. The only solution I can think of (other than extending existing NodeJS proxy with auth) is the put Tinyproxy behind e.g. a reverse proxy which would respond to /ping request and relay all other requests / responses to Tinyproxy listening internally on a different port. This is obviously far from ideal, and at the very least documentation should be corrected to reflect that.

Thanks,

Oct 16 '19 20:10 nirvana-msu

If anyone is interested, I have patched Tinyproxy to implement HTTP ping response required by Scrapoxy, so you can install it by building from source from nirvana-msu/tinyproxy, scrapoxy branch.

Coupled with these fixes for #171 and #172, as well as a proper config to set tag on AWS instances, I now have a properly working setup where proxy instances correctly validate Basic Auth.

I would say the documentation regarding the gotchas when using 3rd party proxy software still needs to be updated. And it should also be made clear in the doc that built-in proxy instances do not validate authorization header (and are thus open proxies available for anyone to use).

P.S. I've also used the (currently cheapest) t3a.nano AWS EC2 instance to build my image and so far it's all working perfectly fine.

Oct 20 '19 01:10 nirvana-msu

Hi, setuped your branch and compiled tinyproxy - now i started scrapoxy with

node server/index.js start conf.json

(used node 14)

but when trying http urls there is this error:

2020-09-07T21:02:46.443Z - error: [Master] Error: request error from target (GET http://www.google.com/ on instance i-0390d07fcf94af17b@xxx:3128): message=Parse Error: Expected HTTP/, stack=Error: Parse Error: Expected HTTP/ at Socket.socketOnData (_http_client.js:509:22) at Socket.emit (events.js:314:20) at Socket.EventEmitter.emit (domain.js:486:12) at addChunk (_stream_readable.js:303:12) at readableAddChunk (_stream_readable.js:279:9) at Socket.Readable.push (_stream_readable.js:218:10)

and for https urls:

2020-09-07T21:08:44.048Z - error: [Master] Error: socket error from client (CONNECT www.google.com:443 on instance i-057e068f4c4fb39aa@xxx:3128): message=read ECONNRESET, stack=Error: read ECONNRESET at TCP.onStreamRead (internal/stream_base_commons.js:209:20), errno=-104, code=ECONNRESET, syscall=read

which node version did you use?

Sep 07 '20 21:09 b4shx0r

corrected with 4.0.0 (all traffic is now encrypted with dedicated TLS certificates between master and proxies)

Oct 04 '23 19:10 fabienvauchelles

Hey there! 🌟 Exciting news! Scrapoxy 4 is ready to rock 🚀. Check it out at Scrapoxy.io (explore the "get started" guide, deployment documentation, and more 📚). I can't wait to hear your feedback on this new version! Send me your coolest screenshots with as many proxies as possible! 📸💻 Join the Discord community if you have any questions or just want to chat. You can also open a GitHub issue for any bug or feature request 🐞✨. See you soon! 😎 Fabien

Dec 12 '23 22:12 fabienvauchelles

scrapoxy scrapoxy copied to clipboard

Documentation incorrectly states that any software accepting CONNECT method could be used as a proxy

Expected Behavior

Actual Behavior

scrapoxy
scrapoxy copied to clipboard