ebooklib
ebooklib copied to clipboard
Error in parsing NAV document containing <a> without href attribute
This can be reproduced on v0.17.1.
When parsing the NAV, the current implementation assumes the href
attribute always exists in the a
element.
def _parse_nav(self, data, base_path, navtype='toc'):
...
def parse_list(list_node):
items = []
for item_node in list_node.findall('li'):
...
link_node = item_node.find('a')
if sublist_node is not None:
...
if link_node is not None:
href = zip_path.normpath(zip_path.join(base_path, link_node.get('href')))
...
elif link_node is not None:
title = link_node.text
href = zip_path.normpath(zip_path.join(base_path, link_node.get('href')))
...
Otherwise, zip_path.join
will throw exception 'NoneType' object has no attribute 'startswith'
.
I guess this assumption is true for most cases, but here I run into some EPUB files in which it's not. Those EPUB files are the preview version of its full edition, and it kept the whole TOC section but removed some of the links inside, hence, a
elements without href
, e.g.
<a>Chapter 29</a>
And from the W3C, this seems to be allowed: https://www.w3.org/TR/2011/WD-html5-20110525/text-level-semantics.html#the-a-element
I guess it's not a common use case, but it would be nice if it can be handled.