AIL-framework icon indicating copy to clipboard operation
AIL-framework copied to clipboard

torcrawler.py timeout 504

Open xme opened this issue 6 years ago • 12 comments

Since I upgraded my AIL instance, I can't crawl any onion site. All requests return a "50"4" error. Is there a way to increase the timeout to reach the site via Tor or is it related to another issue?

Example:

2019-05-27 12:09:57.796865 [events] {"args": {"uid": 140571194442752, "headers": {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0", "Accept-Language": "en"}, "render_all": 1, "har": 1, "wait": 10, "png": 1, "url": "http://winkledgargsurly.onion", "html": 1}, "timestamp": 1558958997, "active": 0, "user-agent": "Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0", "method": "POST", "path": "/render.json", "maxrss": 85020, "rendertime": 30.014784336090088, "client_ip": "172.17.0.1", "qsize": 0, "error": {"type": "GlobalTimeoutError", "error": 504, "description": "Timeout exceeded rendering page", "info": {"timeout": 30}}, "status_code": 504, "fds": 18, "_id": 140571194442752, "load": [4.21, 4.52, 4.81]}

xme avatar May 27 '19 12:05 xme

hey @xme !

Do you have the same issue with regular website (crawled via the UI) ?

The default tor proxy provided by the package management system is not up to date. Using the tor proxy provided by The torproject (#344 ) may solve the problem.

You right, I should add an option to change the default splash timeout.

Terrtia avatar May 27 '19 14:05 Terrtia

Only Onion websites apparently... How can I switch to the tor proxy provided by TorProject?

xme avatar May 27 '19 18:05 xme

Follow these installation steps: https://2019.www.torproject.org/docs/debian.html.en#ubuntu (Option two: Tor on Ubuntu or Debian)

This should overwrite your /etc/tor/torrc configuration file. You need to edit this file as described:

  • Allow Tor to bind to any interface or to the docker interface (by default binds to 127.0.0.1 only) in /etc/tor/torrc SOCKSPort 0.0.0.0:9050 or SOCKSPort 172.17.0.1:9050
  • Add the following line SOCKSPolicy accept 172.17.0.0/16 in /etc/tor/torrc (for a linux docker, the localhost IP is 172.17.0.1; Should be adapted for other platform)
  • Restart the tor proxy: sudo service tor restart

(https://github.com/CIRCL/AIL-framework/blob/master/HOWTO.md#installationconfiguration)

Terrtia avatar May 28 '19 08:05 Terrtia

@xme did you get a solution for this?

annetteshajan avatar Jun 09 '20 04:06 annetteshajan

@annetteshajan I'm running the latest stable tor package (as suggested) but it did not improve. Most of the crawled Onion sites are down. I tested some of them via a Tor browser and it's also impossible to join them. I presume that they are indeed down. Sometimes, I get a peak of available sites... Strange...

xme avatar Jun 09 '20 06:06 xme

@xme Are you sure your tor is installed correctly? When I run curl --socks5 localhost:9050 --socks5-hostname localhost:9050 -s https://check.torproject.org/ | cat | grep -m 1 Congratulations | xargs I do not get any output.. Ideally I should, however the tor service does say that it is running in my system

annetteshajan avatar Jun 09 '20 06:06 annetteshajan

Another question, are you using a local or remote instance of the Splash server? My remote one does not seem to work. Every time I run sudo ./bin/torcrawler/launch_splash_crawler.sh -f configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 1 it gives: * A screen is already launched, please kill it before creating another one. I have even killed all the screens, also reinstalled it, it still gives same output

annetteshajan avatar Jun 09 '20 06:06 annetteshajan

@xme Are you sure your tor is installed correctly? When I run curl --socks5 localhost:9050 --socks5-hostname localhost:9050 -s https://check.torproject.org/ | cat | grep -m 1 Congratulations | xargs I do not get any output.. Ideally I should, however the tor service does say that it is running in my system

# curl --socks5 localhost:9050 --socks5-hostname localhost:9050 -s https://check.torproject.org/ | cat | grep -m 1 Congratulations | xargs
Congratulations. This browser is configured to use Tor.

xme avatar Jun 09 '20 07:06 xme

Another question, are you using a local or remote instance of the Splash server? My remote one does not seem to work. Every time I run sudo ./bin/torcrawler/launch_splash_crawler.sh -f configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 1 it gives:

  • A screen is already launched, please kill it before creating another one. I have even killed all the screens, also reinstalled it, it still gives same output

Default setup... In a docker, 3 instances

xme avatar Jun 09 '20 07:06 xme

Example:

2019-05-27 12:09:57.796865 [events] {"args": {"uid": 140571194442752, "headers": {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0", "Accept-Language": "en"}, "render_all": 1, "har": 1, "wait": 10, "png": 1, "url": "http://winkledgargsurly.onion", "html": 1}, "timestamp": 1558958997, "active": 0, "user-agent": "Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0", "method": "POST", "path": "/render.json", "maxrss": 85020, "rendertime": 30.014784336090088, "client_ip": "172.17.0.1", "qsize": 0, "error": {"type": "GlobalTimeoutError", "error": 504, "description": "Timeout exceeded rendering page", "info": {"timeout": 30}}, "status_code": 504, "fds": 18, "_id": 140571194442752, "load": [4.21, 4.52, 4.81]}

What command is this? @xme

annetteshajan avatar Jun 09 '20 13:06 annetteshajan

https://github.com/CIRCL/AIL-framework/issues/352#issuecomment-641064855 @annetteshajan It seems that you have a screen already running for the root user. Could you kill it before relaunching the mentioned script?

mokaddem avatar Jul 21 '20 05:07 mokaddem

Nevermind. Seems to be solved in https://github.com/CIRCL/AIL-framework/issues/352#issuecomment-641064855

mokaddem avatar Jul 21 '20 05:07 mokaddem

Fixed in AIL v5.0

Terrtia avatar Jul 17 '23 09:07 Terrtia