ebooklib
ebooklib copied to clipboard
Etree python 3.5 fix
I've found some problems with etree and python 3. Etree component was returning bites in python 3 instead of unicode string and I made small change to fix this problems.
[https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.tostring](XML etree documentation)
Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated)
I checked it out and I would say this should be the fix. The problem is that we are trying to find 'str' in 'bytes', and that would fail. In the rest of the code we return 'bytes' all the time, so I assume we should do it also this time, and if you would need Unicode string you should convert it manually. Will think about this issue a bit more.
tree_str = etree.tostring(body, pretty_print=True, encoding='utf-8', xml_declaration=False)
if tree_str.startswith(six.b('<body>')):
n = tree_str.rindex(six.b('</body>'))
return tree_str[7:n]
This appears to be a limitation of XML. So you may want to add your example of how to handle Unicode strings to the documentation.