WeasyPrint icon indicating copy to clipboard operation
WeasyPrint copied to clipboard

Failed to load stylesheet from Windows network drive

Open m-ad opened this issue 2 years ago • 4 comments

I am trying to work with a HTML input that includes a CSS stylesheet on a Windows network share (starting with double backslash). Simplified example:

import weasyprint
s = '<html><head><link rel="stylesheet" type="text/css" href="style.css"></head><body>TEST</body></html>'
base_url = r"\\NETWORK\folder" # replace with real network path that contains a style.css file
weasyprint.HTML(string=s, base_url=base_url).write_pdf("test.pdf")

I get this error message:

Failed to load stylesheet at file:///NETWORK/folder/style.css : URLError: <urlopen error [WinError 3] Das System kann den angegebenen Pfad nicht finden: '\\NETWORK\\folder\\style.css'>

Note file:///NETWORK/folder/style.css has three slashes at the beginning, that's one / missing.

After some introspection I think I found the culprit here https://github.com/Kozea/WeasyPrint/blob/master/weasyprint/urls.py#L130.

Up to this point base_url is "file:////NETWORK/folder" (four slashes) and url is "style.css", but urljoin removes one of the slashes and we end with the broken "file:///NETWORK/folder/style.css" (three slashes).

Like this:

>>> import urllib.parse
>>> urllib.parse.urljoin("file:////NETWORK/folder","style.css")
'file:///NETWORK/style.css'

Might be a good idea to treat URLs and Windows network paths differently or use some other tool than urllib.parse.urljoin to put base_url and url together. I am not sure at which place and in which way this is best treated, though.

m-ad avatar Feb 23 '23 14:02 m-ad

I managed this workaround with a custom url_fetcher:

def custom_url_fetcher(url):
    url = url.replace("file:///", r"\\")
    with open(url, "r") as f:
        string = f.read()
    return dict(string=string, mime_type="text/css", redirect_url=url)

if base_url.startswith(r"\\"):
    url_fetcher = custom_url_fetcher
else:
    url_fetcher = weasyprint.default_url_fetcher
html = weasyprint.HTML(string=html_str, base_url=base_url, url_fetcher=url_fetcher)   

It is kind of hacky, but it works for now. I still would consider this a bug, though.

m-ad avatar Mar 27 '23 13:03 m-ad

Hi!

These URLs are not standard, that’s why they don’t work with urljoin.

Does file://NETWORK/folder work? It seems to be the preferred format for .NET for example, and it should work nicely with urljoin.

liZe avatar Apr 08 '23 08:04 liZe

Hi @liZe,

sure, these are not well-formed URLs, but tell that to Microsoft ;-) That's why I did not open this issue in urllib. urllib.parse.urljoin is joining URLs just fine unless you give it something which is not an URL, like a Windows network drive 🤷‍♂️. So my suggestion for weasyprint would be to detect Windows network drives (startswith(r"\\") ?) and not use urljoin in these cases. I am just not sure where exactly in the code this would make most sense.

As to your suggestion:

import weasyprint
base_url = r"file://NETWORK/folder"  # substitute with real network path that contains a style.css file
s = '<html><head><link rel="stylesheet" type="text/css" href="style.css"></head><body>TEST</body></html>'
weasyprint.HTML(string=s, base_url=base_url).write_pdf("test.pdf")

This code logs this ERROR at __init__:find_stylesheets:278:

Failed to load stylesheet at file://%5C%5CNETWORK%5Cfolder/style.css : URLError: <urlopen error [WinError 2] Das System kann die angegebene Datei nicht finden: '\\style.css'>

Note the major confusion with the slashes, a mix between escaped backslashes "%5C"s, normal backslashes "" and normal forward-slashes "/".

No exception is raised and a PDF is generated, but it does not use the stylesheet.

m-ad avatar Apr 19 '23 07:04 m-ad

If someone here has a Windows system, I’d loved to find a real URL pattern for network drives that works with urlopen!

liZe avatar Aug 19 '23 10:08 liZe

If anyone can give more information on this, feel free to add a comment.

liZe avatar Jun 07 '24 07:06 liZe