gitbook2pdf icon indicating copy to clipboard operation
gitbook2pdf copied to clipboard

The content is truncated after '< script >'

Open ZiheLiu opened this issue 3 years ago • 0 comments

When I convert The Go Programming Language into pdf, the output pdf file is truncated after section 5.2.

The reason is that it uses html.unescape() to convert escape characters into corresponding unicode characters. However, the original HTML code of "练习 5.3: 编写函数输出所有text结点的内容。注意不要访问

When I remove the call html.unescape() as follows, then the output pdf contains the whole content.

    def parser(self):
        tree = ET.HTML(self.original)
        if tree.xpath('//section[@class="normal markdown-section"]'):
            context = tree.xpath('//section[@class="normal markdown-section"]')[0]
        else:
            context = tree.xpath('//section[@class="normal"]')[0]
        if context.find('footer'):
            context.remove(context.find('footer'))
        context = self.parsehead(context)
-       return html.unescape(ET.tostring(context).decode())
+       return ET.tostring(context).decode()

ZiheLiu avatar Feb 25 '21 11:02 ZiheLiu