CairoSVG icon indicating copy to clipboard operation
CairoSVG copied to clipboard

Support IRIs instead of URIs

Open swmerko opened this issue 8 years ago • 2 comments

Hi, when i try to rasterize a svg with inside the href attribute an utf-8 character ( this -> °) i have an UnicodeEncodeError: 'ascii' codec can't encode character '\xb0' in position 22: ordinal not in range(128)

i prepared this little example:

from urllib.request import urlopen
from xml.dom import minidom
import cairosvg
svg_url = 'http://static.photosi.com/repository/test/ascii_in_file.svg'
svg_object = urlopen(svg_url)
svg_document = minidom.parse(svg_object)
svg = svg_document.toxml()
svg_byte = svg.encode('utf8')
cairosvg.svg2png(bytestring=svg_byte, parent_width=800, parent_height=550)

if you run print (svg_document.toxml()) you can see the 'utf-8' well encoded (there's q°_.jpg instead the original ascii q°_.jpg

Am i doing something wrong? utf-8 chars are ok in url. Is this a bug?

swmerko avatar Jul 18 '17 16:07 swmerko

utf-8 chars are ok in url

Citation needed :wink:. They're not OK, according to rfc1738 and Wikipedia:

Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme.

Other [than listed reserved and unreserved] characters in a URI must be percent encoded.

I'm not sure that there's no bug about encoded URLs in CairoSVG, but you'll need to convince me!

liZe avatar Jul 18 '17 16:07 liZe

I understand, you are right. You can use IRI according to this RFC. It's modern, but it's not mandatory. ;)

swmerko avatar Aug 03 '17 07:08 swmerko