AIL-framework When crawling, all domains appear to be DOWN

ISSUE I tried to crawl a regular domain (not .onion) and the status fo the domain comes up as DOWN. I've tried this will multiple domains and even .onion domains but the result is the same, all domains are DOWN.

SETUP I have AIL, TOR, and Splash all installed and running on a single machine with one docker instance of Splash running on 8050 and Tor running on 9050

tcp        0      0 127.0.0.1:9050          0.0.0.0:*               LISTEN      18298/tor           
tcp6       0      0 :::8050                 :::*                    LISTEN      22611/docker-proxy

Logs from Splash Docker

2020-04-10 08:56:20.300419 [-] "X.X.X.X" - - [10/Apr/2020:08:56:19 +0000] "GET / HTTP/1.1" 200 7679 "-" "python-requests/2.22.0"
2020-04-10 08:56:20.859058 [render] [140342956635136] loadFinished: unknown error
2020-04-10 08:56:20.860248 [events] {"path": "/execute", "rendertime": 0.007615327835083008, "maxrss": 176844, "load": [0.05, 0.19, 0.18], "fds": 60, "active": 0, "qsize": 0, "_id": 140342956635136, "method": "POST", "timestamp": 1586508980, "user-agent": "Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0", "args": {"cookies": [], "headers": {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0"}, "lua_source": "\nfunction main(splash, args)\n    -- Default values\n    splash.js_enabled = true\n    splash.private_mode_enabled = true\n    splash.images_enabled = true\n    splash.webgl_enabled = true\n    splash.media_source_enabled = true\n\n    -- Force enable things\n    splash.plugins_enabled = true\n    splash.request_body_enabled = true\n    splash.response_body_enabled = true\n\n    splash.indexeddb_enabled = true\n    splash.html5_media_enabled = true\n    splash.http2_enabled = true\n\n    -- User defined\n    splash.resource_timeout = args.resource_timeout\n    splash.timeout = args.timeout\n\n    -- Allow to pass cookies\n    splash:init_cookies(args.cookies)\n\n    -- Run\n    ok, reason = splash:go{args.url}\n    if not ok and not reason:find(\"http\") then\n        return {\n            error = reason,\n            last_url = splash:url()\n        }\n    end\n    if reason == \"http504\" then\n        splash:set_result_status_code(504)\n        return ''\n    end\n\n    splash:wait{args.wait}\n    -- Page instrumentation\n    -- splash.scroll_position = {y=1000}\n    splash:wait{args.wait}\n    -- Response\n    return {\n        har = splash:har(),\n        html = splash:html(),\n        png = splash:png{render_all=true},\n        cookies = splash:get_cookies(),\n        last_url = splash:url()\n    }\nend\n", "resource_timeout": 30, "timeout": 30, "url": "http://somedomain.onion", "wait": 10, "uid": 140342956635136}, "status_code": 200, "client_ip": "172.17.0.1"}
2020-04-10 08:56:20.860431 [-] "172.17.0.1" - - [10/Apr/2020:08:56:19 +0000] "POST /execute HTTP/1.1" 200 68 "-" "Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0"

The line of code in Splash generating the error message above https://github.com/scrapinghub/splash/blob/9fda128b8485dd5f67eb103cd30df8f325a90bb0/splash/engines/webkit/browser_tab.py#L446

Apr 10 '20 09:04 sunil3590

Were you able to fix this? @sunil3590 Experiencing the same issue, Splash Down and all domains are down.

Sep 28 '20 02:09 GaganBhat

@Terrtia I'm having a similar issue with Tor links where I get a "SPLASH DOWN" error but only with onion links.

Regular crawler however works.

Oct 02 '20 02:10 GaganBhat

Hello I have the same issue. Is there any update? thanks

Feb 08 '21 14:02 TheFausap

Maybe I found the error in the screen logs (screen -r Crawlers_AIL):

 File "/opt/AIL/bin/torcrawler/TorSplashCrawler.py", line 181, in parse
    error_retry = request.meta.get('error_retry', 0)
NameError: name 'request' is not defined

Feb 08 '21 14:02 TheFausap

@TheFausap @Terrtia did you find the fix for this? i also cant crawl any onion domain since they appear to be down

Feb 12 '22 21:02 matriceria