linkchecker icon indicating copy to clipboard operation
linkchecker copied to clipboard

Internal error on external redirect URL

Open bgoldowsky opened this issue 9 years ago • 2 comments

This is an intermittent error, it occasionally happens when running a full scan on our site. The error message refers to an external google redirect-style https://goo.gl/maps/... URL.

OS: Scientific Linux (RHEL clone) 7.1. Python: 2.7.5 LinkChecker 9.3 released 16.7.2014 Command line: /usr/bin/linkchecker --no-status --check-extern http://www.cast.org

Config file with comments removed:

[output] verbose=0 [text] parts=parenturl,name,url,realurl,result,info,warning,outro [gml] [dot] [csv] [sql] [html] [blacklist] [xml] [gxml] [sitemap] [checking] sslverify=/etc/ssl/certs/ca-bundle.crt maxnumurls=1000 [filtering] [authentication] [AnchorCheck]

Here's the report error message.

********** Oops, I did it again. *************

You have found an internal error in LinkChecker. Please write a bug report at https://github.com/wummel/linkchecker/issues and include the following information:

  • the URL or file you are testing
  • the system information below

When using the commandline client:

  • your commandline arguments and any custom configuration files.
  • the output of a debug run with option "-Dall"

Not disclosing some of the information above due to privacy reasons is ok. I will try to help you nonetheless, but you have to give me something I can work with ;) .

Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/linkcheck/director/checker.py", line 104, in check_url line: self.check_url_data(url_data) locals: self = <Checker(CheckThread-https://goo.gl/maps/MDumG, started 140146718906112)> self.check_url_data = <bound method Checker.check_url_data of <Checker(CheckThread-https://goo.gl/maps/MDumG, started 140146718906112)>> url_data = <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG> File "/usr/lib64/python2.7/site-packages/linkcheck/director/checker.py", line 120, in check_url_data line: check_url(url_data, self.logger) locals: check_url = <function check_url at 0x2a8a5f0> url_data = <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG> self = <Checker(CheckThread-https://goo.gl/maps/MDumG, started 140146718906112)> self.logger = <linkcheck.director.logger.Logger object at 0x2bed4d0> File "/usr/lib64/python2.7/site-packages/linkcheck/director/checker.py", line 52, in check_url line: url_data.check() locals: url_data = <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG> url_data.check = <bound method HttpUrl.check of <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG>> File "/usr/lib64/python2.7/site-packages/linkcheck/checker/urlbase.py", line 424, in check line: self.local_check() locals: self = <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG> self.local_check = <bound method HttpUrl.local_check of <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG>> File "/usr/lib64/python2.7/site-packages/linkcheck/checker/urlbase.py", line 442, in local_check line: self.check_connection() locals: self = <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG> self.check_connection = <bound method HttpUrl.check_connection of <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG>> File "/usr/lib64/python2.7/site-packages/linkcheck/checker/httpurl.py", line 135, in check_connection line: self.send_request(request) locals: self = <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG> self.send_request = <bound method HttpUrl.send_request of <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG>> request = <PreparedRequest [GET]> File "/usr/lib64/python2.7/site-packages/linkcheck/checker/httpurl.py", line 165, in send_request line: self._send_request(request, **kwargs) locals: self = <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG> self._send_request = <bound method HttpUrl._send_request of <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG>>/usr/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:769: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html InsecureRequestWarning)

  request = <local> <PreparedRequest [GET]>
  kwargs = <local> {'timeout': 60, 'stream': True, 'allow_redirects': False, 'verify': '/etc/ssl/certs/ca-bundle.crt'}

File "/usr/lib64/python2.7/site-packages/linkcheck/checker/httpurl.py", line 172, in _send_request line: self._add_ssl_info() locals: self = <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG> self._add_ssl_info = <bound method HttpUrl._add_ssl_info of <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG>> File "/usr/lib64/python2.7/site-packages/linkcheck/checker/httpurl.py", line 199, in _add_ssl_info line: self.ssl_cert = httputil.x509_to_dict(cert) locals: self = <https link, base_url=u'https://goo.gl/maps/MDumG', parent_url=u'http://www.cast.org', base_ref=None, recursion_level=1, url_connection=None, line=1770, column=25, page=0, name=u'40 Foundry Street ', anchor=u'', cache_url=https://goo.gl/maps/MDumG> self.ssl_cert = None httputil = <module 'linkcheck.httputil' from '/usr/lib64/python2.7/site-packages/linkcheck/httputil.pyc'> httputil.x509_to_dict = <function x509_to_dict at 0x29118c0> cert = <X509 object at 0x7f7664153170> File "/usr/lib64/python2.7/site-packages/linkcheck/httputil.py", line 47, in x509_to_dict line: parsedtime = asn1_generaltime_to_seconds(notAfter) locals: parsedtime = asn1_generaltime_to_seconds = <function asn1_generaltime_to_seconds at 0x2911938> notAfter = '20150707000000Z', len = 15 File "/usr/lib64/python2.7/site-packages/linkcheck/httputil.py", line 68, in asn1_generaltime_to_seconds line: res = datetime.strptime(timestr, timeformat + 'Z') locals: res = None datetime = <type 'datetime.datetime'> datetime.strptime = <built-in method strptime of type object at 0x7f7678581b80> timestr = '20150707000000Z', len = 15 timeformat = '%Y%m%d%H%M%S', len = 12 AttributeError: _strptime System info: LinkChecker 9.3 Released on: 16.7.2014 Python 2.7.5 (default, May 5 2014, 09:13:10) [GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2 Requests: 2.6.0 Modules: Sqlite Local time: 2015-04-27 02:30:29-004 sys.argv: ['/usr/bin/linkchecker', '--no-status', '--check-extern', 'http://www.cast.org'] LANG = 'en_US.UTF-8' Default locale: ('en', 'UTF-8')

******** LinkChecker internal error, over and out ******** WARNING 2015-04-27 02:30:29,402 CheckThread-https://goo.gl/maps/MDumG internal error occurred

bgoldowsky avatar Apr 28 '15 13:04 bgoldowsky

Hi bgoldowsky,

Not sure if you have figured it out yet.. but turns out this is a bug in python's strptime module (https://bugs.python.org/issue7980). Basically, import _strptime is thread unsafe and borks.

I am going to submit a pull request to implement a workaround, but considering that this repo hasn't been updated since Sep 2014, the turnaround might be a bit long.

faddy avatar Aug 25 '15 21:08 faddy

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues if your issue still persists

dpalic avatar Oct 30 '17 07:10 dpalic