gitbook2pdf The content is truncated after '< script >'

The content is truncated after '< script >'

Open ZiheLiu opened this issue 3 years ago • 0 comments

When I convert The Go Programming Language into pdf, the output pdf file is truncated after section 5.2.

The reason is that it uses html.unescape() to convert escape characters into corresponding unicode characters. However, the original HTML code of "练习 5.3：编写函数输出所有text结点的内容。注意不要访问

When I remove the call html.unescape() as follows, then the output pdf contains the whole content.

    def parser(self):
        tree = ET.HTML(self.original)
        if tree.xpath('//section[@class="normal markdown-section"]'):
            context = tree.xpath('//section[@class="normal markdown-section"]')[0]
        else:
            context = tree.xpath('//section[@class="normal"]')[0]
        if context.find('footer'):
            context.remove(context.find('footer'))
        context = self.parsehead(context)
-       return html.unescape(ET.tostring(context).decode())
+       return ET.tostring(context).decode()

Feb 25 '21 11:02 ZiheLiu

gitbook2pdf gitbook2pdf copied to clipboard

The content is truncated after '&lt; script &gt;'

gitbook2pdf
gitbook2pdf copied to clipboard

The content is truncated after '< script >'