gitbook2pdf
gitbook2pdf copied to clipboard
The content is truncated after '< script >'
When I convert The Go Programming Language into pdf, the output pdf file is truncated after section 5.2.
The reason is that it uses html.unescape()
to convert escape characters into corresponding unicode characters.
However, the original HTML code of "练习 5.3: 编写函数输出所有text结点的内容。注意不要访问
When I remove the call html.unescape()
as follows, then the output pdf contains the whole content.
def parser(self):
tree = ET.HTML(self.original)
if tree.xpath('//section[@class="normal markdown-section"]'):
context = tree.xpath('//section[@class="normal markdown-section"]')[0]
else:
context = tree.xpath('//section[@class="normal"]')[0]
if context.find('footer'):
context.remove(context.find('footer'))
context = self.parsehead(context)
- return html.unescape(ET.tostring(context).decode())
+ return ET.tostring(context).decode()