fingerprint-securedrop icon indicating copy to clipboard operation
fingerprint-securedrop copied to clipboard

Crawler is running into terminal connection refused socket failures

Open psivesely opened this issue 8 years ago • 5 comments

Edit: see https://github.com/freedomofpress/FingerprintSecureDrop/issues/4#issuecomment-228825080 for a better explanation and traceback. Don't know why this original report was so half-assed and lacked even the full traceback.

So the crawler is for the most part working very well. Where it runs into problems is what seems to be a Python IO/socket exception (Errno 111). Once it hits this error, it will fail the rest of the way through the crawl pretty instantaneously. See the log at the bottom of this post.

I believe that this is actually cause by a bug in Python3.5--see https://bugs.python.org/issue26402, but this warrants further testing. The PPA we've been using at https://launchpad.net/~fkrull/+archive/ubuntu/deadsnakes?field.series_filter=trusty has not seen an updated version of Python3.5 since December for Ubuntu 14.04 (trusty). This is about our only choice for newer Python versions, and I've already done the work to migrate this script to Python3.5, so we could use a single virtual environment for both the HS sorting and crawling scripts. Since at this point in our research we don't really need to run the sorting script, I think I'll just break compatibility with it by making the necessary changes in the ansible roles to install and use Python3.3 and that should hopefully fix things.

♫ Truckin' ♫
...
06:51:26 http://maghreb2z2zua2up.onion: exception: Remote end closed connection without response
06:51:26 http://radiohoodxwsn4es.onion: loading...
06:51:26 http://radiohoodxwsn4es.onion: exception: [Errno 111] Connection refused
06:51:26 http://tqjftqibbwtm4wmg.onion: loading...
06:51:26 http://tqjftqibbwtm4wmg.onion: exception: [Errno 111] Connection refused
06:51:26 http://newstarhrtqt6ua7.onion: loading...
06:51:26 http://newstarhrtqt6ua7.onion: exception: [Errno 111] Connection refused
...
And so on (fails through the rest of the URLs pretty instantly.

https://bugs.python.org/issue26402

psivesely avatar May 18 '16 20:05 psivesely