ebooklib
ebooklib copied to clipboard
[Parser] Could not get title for TOC items where <a> contains nested element
This can be reproduced on v0.17.1.
For NAV document with the following content: (Per the EPUB spec, this seems to be allowed. http://www.idpf.org/epub/301/spec/epub-contentdocs.html#sec-xhtml-nav-def-model)
<nav epub:type="toc" id="toc">
<ol>
<li>
<a href="xhtml/001.xhtml"><span style="color:blue;">Section 1</span></a>
</li>
<li>
<a href="xhtml/004.xhtml"><span style="color:blue;">Section 2</span></a>
</li>
...
</ol>
</nav>
Due to the following implementation, EpubReader returns None as the title for these two TOC items:
def _parse_nav(self, data, base_path, navtype='toc'):
...
def parse_list(list_node):
items = []
for item_node in list_node.findall('li'):
sublist_node = item_node.find('ol')
link_node = item_node.find('a')
if sublist_node is not None:
...
elif link_node is not None:
title = link_node.text
href = zip_path.normpath(zip_path.join(base_path, link_node.get('href')))
items.append(Link(href, title))
return items
Would be nice if this can be supported.