ebooklib icon indicating copy to clipboard operation
ebooklib copied to clipboard

[Parser] Could not get title for TOC items where <a> contains nested element

Open unext-wendong opened this issue 6 years ago • 0 comments

This can be reproduced on v0.17.1.

For NAV document with the following content: (Per the EPUB spec, this seems to be allowed. http://www.idpf.org/epub/301/spec/epub-contentdocs.html#sec-xhtml-nav-def-model)

<nav epub:type="toc" id="toc">
    <ol>
        <li>
            <a href="xhtml/001.xhtml"><span style="color:blue;">Section 1</span></a>
        </li>
        <li>
            <a href="xhtml/004.xhtml"><span style="color:blue;">Section 2</span></a>
        </li>
        ...
    </ol>
</nav>

Due to the following implementation, EpubReader returns None as the title for these two TOC items:

def _parse_nav(self, data, base_path, navtype='toc'):
    ...

    def parse_list(list_node):
        items = []

        for item_node in list_node.findall('li'):

            sublist_node = item_node.find('ol')
            link_node = item_node.find('a')

            if sublist_node is not None:
                ...
            elif link_node is not None:
                title = link_node.text
                href = zip_path.normpath(zip_path.join(base_path, link_node.get('href')))

                items.append(Link(href, title))

        return items

Would be nice if this can be supported.

unext-wendong avatar Feb 04 '19 11:02 unext-wendong