linkchecker
linkchecker copied to clipboard
AttributeError: 'HttpUrl' object has no attribute 'proxy_type'
The problem appears when i chek local web-site: linkchecker --ignore-url=^mailto: www.sintez.dev
I've tried to change 'url_data.proxy_type' to 'url_data.proxytype', but it creates a new error in /usr/local/lib/python2.7/dist-packages/requests/sessions.py: AttributeError: 'set' object has no attribute 'setdefault'
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 104, in check_url
line: self.check_url_data(url_data)
locals:
self =
Statistics: Downloaded: 0B. No statistics available since no URLs were checked.
That's it. 0 links in 0 URLs checked. 0 warnings found. 0 errors found. Stopped checking at 2014-10-09 14:16:08+004 (0.06 seconds) Requests: 2.4.1 Qt: 4.8.6 / PyQt: 4.10.4 Modules: Sqlite, Gconf Local time: 2014-10-09 14:16:08+004 sys.argv: ['/usr/bin/linkchecker', 'www.sintezr.dev'] http_proxy = 'http://192.168.1.254:3128/' no_proxy = 'localhost,127.0.0.0/8,::1,.dev,192.168.,10.10.20.67' LANGUAGE = 'ru:en' LANG = 'ru_RU.UTF-8' Default locale: ('ru', 'UTF-8')
Same issue here:
$ linkchecker --ignore-url=^mailto: http://github.com
INFO 2014-11-03 11:01:58,561 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
LinkChecker 9.3 Copyright (C) 2000-2014 Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' within this
distribution.
Get the newest version at http://wummel.github.io/linkchecker/
Write comments and bugs to https://github.com/wummel/linkchecker/issues
Support this project at http://wummel.github.io/linkchecker/donations.html
Start checking at 2014-11-03 11:01:58-004
********** Oops, I did it again. *************
You have found an internal error in LinkChecker. Please write a bug report
at https://github.com/wummel/linkchecker/issues
and include the following information:
- the URL or file you are testing
- the system information below
When using the commandline client:
- your commandline arguments and any custom configuration files.
- the output of a debug run with option "-Dall"
Not disclosing some of the information above due to privacy reasons is ok.
I will try to help you nonetheless, but you have to give me something
I can work with ;) .
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/linkcheck/director/checker.py", line 104, in check_url
line: self.check_url_data(url_data)
locals:
self = <local> <Checker(CheckThread-http://github.com, started 140448599136000)>
self.check_url_data = <local> <bound method Checker.check_url_data of <Checker(CheckThread-http://github.com, started 140448599136000)>>
url_data = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
File "/usr/lib/python2.7/site-packages/linkcheck/director/checker.py", line 120, in check_url_data
line: check_url(url_data, self.logger)
locals:
check_url = <global> <function check_url at 0x7fbcbdd3fed8>
url_data = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
self = <local> <Checker(CheckThread-http://github.com, started 140448599136000)>
self.logger = <local> <linkcheck.director.logger.Logger object at 0x7fbcbd5e1310>
File "/usr/lib/python2.7/site-packages/linkcheck/director/checker.py", line 52, in check_url
line: url_data.check()
locals:
url_data = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
url_data.check = <local> <bound method HttpUrl.check of <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>>
File "/usr/lib/python2.7/site-packages/linkcheck/checker/urlbase.py", line 424, in check
line: self.local_check()
locals:
self = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
self.local_check = <local> <bound method HttpUrl.local_check of <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>>
File "/usr/lib/python2.7/site-packages/linkcheck/checker/urlbase.py", line 442, in local_check
line: self.check_connection()
locals:
self = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
self.check_connection = <local> <bound method HttpUrl.check_connection of <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>>
File "/usr/lib/python2.7/site-packages/linkcheck/checker/httpurl.py", line 128, in check_connection
line: if not self.allows_robots(self.url):
locals:
self = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
self.allows_robots = <local> <bound method HttpUrl.allows_robots of <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>>
self.url = <local> u'http://github.com', len = 17
File "/usr/lib/python2.7/site-packages/linkcheck/checker/httpurl.py", line 66, in allows_robots
line: return self.aggregate.robots_txt.allows_url(self)
locals:
self = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
self.aggregate = <local> <linkcheck.director.aggregator.Aggregate object at 0x7fbcbd5e1350>
self.aggregate.robots_txt = <local> <linkcheck.cache.robots_txt.RobotsTxt object at 0x7fbcbd5e1290>
self.aggregate.robots_txt.allows_url = <local> <bound method RobotsTxt.allows_url of <linkcheck.cache.robots_txt.RobotsTxt object at 0x7fbcbd5e1290>>
File "/usr/lib/python2.7/site-packages/linkcheck/cache/robots_txt.py", line 49, in allows_url
line: return self._allows_url(url_data, roboturl)
locals:
self = <local> <linkcheck.cache.robots_txt.RobotsTxt object at 0x7fbcbd5e1290>
self._allows_url = <local> <bound method RobotsTxt._allows_url of <linkcheck.cache.robots_txt.RobotsTxt object at 0x7fbcbd5e1290>>
url_data = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
roboturl = <local> u'http://github.com/robots.txt', len = 28
File "/usr/lib/python2.7/site-packages/linkcheck/cache/robots_txt.py", line 71, in _allows_url
line: kwargs["proxies"] = {url_data.proxy_type, url_data.proxy}
locals:
kwargs = <local> {'auth': None, 'session': <requests.sessions.Session object at 0x7fbcbd5e1510>}
url_data = <local> <http link, base_url=u'http://github.com', parent_url=None, base_ref=None, recursion_level=0, url_connection=None, line=0, column=0, page=0, name=u'', anchor=u'', cache_url=http://github.com>
url_data.proxy_type = <local> !AttributeError: 'HttpUrl' object has no attribute 'proxy_type'
url_data.proxy = <local> '192.1.1.250:8080', len = 16
AttributeError: 'HttpUrl' object has no attribute 'proxy_type'
System info:
LinkChecker 9.3
Released on: 16.7.2014
Python 2.7.8 (default, Sep 24 2014, 18:26:21)
[GCC 4.9.1 20140903 (prerelease)] on linux2
Requests: 2.4.3
Qt: 4.8.6 / PyQt: 4.11.2
Modules: Sqlite
Local time:
Statistics:
2014-11-03 11:01:58-004Downloaded: 0B.
No statistics available since no URLs were checked.
sys.argv: That's it. 0 links in 0 URLs['/usr/bin/linkchecker', '--ignore-url=^mailto:', 'http://github.com']
http_proxy checked. = 'http://192.1.1.250:8080'
LANGUAGE 0 warnings found=. 0 errors found.
''Stopped checking at 2014-11-03 11:01:58-004 (0.05 seconds)
LANG = 'en_CA.UTF-8'
Default locale: ('en', 'UTF-8')
******** LinkChecker internal error, over and out ********
WARNING 2014-11-03 11:01:58,617 CheckThread-http://github.com internal error occurred
Running under ArchLinux x86_64.
Yes same issue I believe
(linkchecker)[col@wave-service-checker linkchecker]$ linkchecker --check-extern http://httpbin.org/get LinkChecker 9.3 Copyright (C) 2000-2014 Bastian Kleineidam LinkChecker comes with ABSOLUTELY NO WARRANTY! This is free software, and you are welcome to redistribute it under certain conditions. Look at the file `LICENSE' within this distribution. Get the newest version at http://wummel.github.io/linkchecker/ Write comments and bugs to https://github.com/wummel/linkchecker/issues Support this project at http://wummel.github.io/linkchecker/donations.html
Start checking at 2014-11-12 12:33:24+000
********** Oops, I did it again. *************
You have found an internal error in LinkChecker. Please write a bug report at https://github.com/wummel/linkchecker/issues and include the following information:
- the URL or file you are testing
- the system information below
When using the commandline client:
- your commandline arguments and any custom configuration files.
- the output of a debug run with option "-Dall"
Not disclosing some of the information above due to privacy reasons is ok. I will try to help you nonetheless, but you have to give me something I can work with ;) .
Traceback (most recent call last):
File "/var/www/linkchecker/lib/python2.7/site-packages/LinkChecker-9.3-py2.7-linux-x86_64.egg/linkcheck/director/checker.py", line 104, in check_url
line: self.check_url_data(url_data)
locals:
self =
Statistics: Downloaded: 0B. No statistics available since no URLs were checked.
That's it. 0 links in 0 URLs checked. 0 warnings found. 0 errors found.
Stopped checking at 2014-11-12 12:33:24+000 (0.04 seconds)
url_data =
******** LinkChecker internal error, over and out ******** WARNING 2014-11-12 12:33:24,458 CheckThread-http://httpbin.org/get internal error occurred
i don't have time to merge this at the minute sorry, but change line 61 of robots_txt.py
if hasattr(url_data, "proxy") and hasattr(url_data, "proxy_type"):
See also commits https://github.com/wummel/linkchecker/commit/52337f82cbc89c93929c16a8dd3eb0df60150300 and https://github.com/wummel/linkchecker/commit/4e56eceb358ae9e9c25833adbc44b761d321b586.