WeasyPrint icon indicating copy to clipboard operation
WeasyPrint copied to clipboard

Encoding property for SVG not honored

Open mormahr opened this issue 3 years ago • 1 comments

According to the MIME type registration for SVGs, the encoding can be specified via the charset Content-Type parameter. Currently, WeasyPrint doesn't use the encoding property optionally returned by url_fetcher. Instead, it passes file-like objects as plain bytes to ElementTree.fromstring(string). This method correctly uses the encoding information from the XML encoding attribute, but it should also be possible to specify the encoding via the Content-Type header.

SVGs are handled in get_image_from_uri. Particularly in L102 where data is read as bytes. My initial instinct was to decode the data immediately there to a string, but this method, of course, also deals with raster images.

I think there are three options to handle the decoding:

  1. Move reading the MIME type up and do it in L102 conditionally based on the MIME type. I'm not sure how this should interact with the retry logic if no content type was given.
  2. Do it in L102 conditionally on if an encoding was passed. This relies on users not passing an encoding/charset for content types that don't allow it, which I wouldn't recommend.
  3. Do it directly before calling ElementTree.fromstring(string) at both call sites. In this case, we need to check if the variable string is a string or bytes-like object.

If there is no encoding returned by the url_fetcher, I would leave the data as bytes, so the XML parser can use its own logic to decode.

mormahr avatar Feb 24 '22 03:02 mormahr

Hello, and thank you for this issue.

I think there are three options to handle the decoding:

I like 1. When we have a file object or a string (that is actually a bytestring according to the docstring), we should read it and try do decode it according to the given encoding. If we have no encoding, or if we catch an UnicodeDecodeError, we should keep the bytestring.

ElementTree seems to be happy with Unicode strings and UTF-8-encoded bytestrings. Solving this issue should help with SVG documents served using another encoding.

liZe avatar Mar 07 '22 15:03 liZe