Failed to load stylesheet from Windows network drive
I am trying to work with a HTML input that includes a CSS stylesheet on a Windows network share (starting with double backslash). Simplified example:
import weasyprint
s = '<html><head><link rel="stylesheet" type="text/css" href="style.css"></head><body>TEST</body></html>'
base_url = r"\\NETWORK\folder" # replace with real network path that contains a style.css file
weasyprint.HTML(string=s, base_url=base_url).write_pdf("test.pdf")
I get this error message:
Failed to load stylesheet at file:///NETWORK/folder/style.css : URLError: <urlopen error [WinError 3] Das System kann den angegebenen Pfad nicht finden: '\\NETWORK\\folder\\style.css'>
Note file:///NETWORK/folder/style.css has three slashes at the beginning, that's one / missing.
After some introspection I think I found the culprit here https://github.com/Kozea/WeasyPrint/blob/master/weasyprint/urls.py#L130.
Up to this point base_url is "file:////NETWORK/folder" (four slashes) and url is "style.css", but urljoin removes one of the slashes and we end with the broken "file:///NETWORK/folder/style.css" (three slashes).
Like this:
>>> import urllib.parse
>>> urllib.parse.urljoin("file:////NETWORK/folder","style.css")
'file:///NETWORK/style.css'
Might be a good idea to treat URLs and Windows network paths differently or use some other tool than urllib.parse.urljoin to put base_url and url together. I am not sure at which place and in which way this is best treated, though.
I managed this workaround with a custom url_fetcher:
def custom_url_fetcher(url):
url = url.replace("file:///", r"\\")
with open(url, "r") as f:
string = f.read()
return dict(string=string, mime_type="text/css", redirect_url=url)
if base_url.startswith(r"\\"):
url_fetcher = custom_url_fetcher
else:
url_fetcher = weasyprint.default_url_fetcher
html = weasyprint.HTML(string=html_str, base_url=base_url, url_fetcher=url_fetcher)
It is kind of hacky, but it works for now. I still would consider this a bug, though.
Hi!
These URLs are not standard, that’s why they don’t work with urljoin.
Does file://NETWORK/folder work? It seems to be the preferred format for .NET for example, and it should work nicely with urljoin.
Hi @liZe,
sure, these are not well-formed URLs, but tell that to Microsoft ;-) That's why I did not open this issue in urllib. urllib.parse.urljoin is joining URLs just fine unless you give it something which is not an URL, like a Windows network drive 🤷♂️. So my suggestion for weasyprint would be to detect Windows network drives (startswith(r"\\") ?) and not use urljoin in these cases. I am just not sure where exactly in the code this would make most sense.
As to your suggestion:
import weasyprint
base_url = r"file://NETWORK/folder" # substitute with real network path that contains a style.css file
s = '<html><head><link rel="stylesheet" type="text/css" href="style.css"></head><body>TEST</body></html>'
weasyprint.HTML(string=s, base_url=base_url).write_pdf("test.pdf")
This code logs this ERROR at __init__:find_stylesheets:278:
Failed to load stylesheet at file://%5C%5CNETWORK%5Cfolder/style.css : URLError: <urlopen error [WinError 2] Das System kann die angegebene Datei nicht finden: '\\style.css'>
Note the major confusion with the slashes, a mix between escaped backslashes "%5C"s, normal backslashes "" and normal forward-slashes "/".
No exception is raised and a PDF is generated, but it does not use the stylesheet.
If someone here has a Windows system, I’d loved to find a real URL pattern for network drives that works with urlopen!
If anyone can give more information on this, feel free to add a comment.