To much quoting of URL fragments
I have several documents, where the fragment identifier of the URL contains colons (':'), which linkchecker claims to be "not normalized". According to my reading of RFC 3986 ':' is allowed, but the specification is quiet unclear if the character must be escaped, as ':' is in the "reserved" set.
While researching this issue I found https://jazzy.id.au/static/fragment-encoding-test.html, which indicates that escaping as few as possible seems to be a good thing for browser compatibility.
http://pythonhosted.org/uritools/ also does not escape the ':'.
Maybe the following patch is in order:
From 2d6f2eeecb6ed6ca31e2e3cf0760b538b2f39c48 Mon Sep 17 00:00:00 2001
Message-Id: <2d6f2eeecb6ed6ca31e2e3cf0760b538b2f39c48.1436776003.git.hahn@univention.de>
From: Philipp Hahn <[email protected]>
Date: Mon, 13 Jul 2015 10:24:44 +0200
Subject: [PATCH] Fix fragment identifier quoting
Organization: Univention GmbH, Bremen, Germany
According to <https://tools.ietf.org/html/rfc3986>:
fragment = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
---
linkcheck/url.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/linkcheck/url.py b/linkcheck/url.py
index 6263fb7..a3a1dd3 100644
--- a/linkcheck/url.py
+++ b/linkcheck/url.py
@@ -329,7 +329,7 @@ def url_norm (url, encoding=None):
urlparts[0] = url_quote_part(urlparts[0], encoding=encoding) # scheme
urlparts[1] = url_quote_part(urlparts[1], safechars='@:', encoding=encoding) # host
urlparts[2] = url_quote_part(urlparts[2], safechars=_nopathquote_chars, encoding=encoding) # path
- urlparts[4] = url_quote_part(urlparts[4], encoding=encoding) # anchor
+ urlparts[4] = url_quote_part(urlparts[4], safechars="!$&'()*+,-./;=?@_~", encoding=encoding) # anchor
res = urlunsplit(urlparts)
if url.endswith('#') and not urlparts[4]:
# re-append trailing empty fragment
--
1.9.1
Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues if your issue still persists