WeasyPrint icon indicating copy to clipboard operation
WeasyPrint copied to clipboard

It not work when you render 汉字 by `code` tag

Open lar-ry opened this issue 3 years ago • 2 comments

I want to render <code>汉字</code>,

it will display the tip:

invalid literal for int() with base 16: ''

v55.0 is normal, but v56.0b1, v56.0 and v56.1 is abnormal.

lar-ry avatar Sep 15 '22 03:09 lar-ry

Hi!

Could you please send the PDF generated by version 55 and the one generated by version 56 (if the error doesn’t make WeasyPrint crash)?

liZe avatar Sep 15 '22 08:09 liZe

version 55: work normal.

Successfully installed weasyprint-55.0
PS C:\Users\larry\md2pdf> python .\md2pdf.py doc/index.md -s .\style\paper-zh.less -r all
🎉 doc/index has converted.
PS C:\Users\larry\md2pdf> 

image generated pdf: index.pdf


version 56: it work abnormal and can not make pdf successful.

Successfully installed weasyprint-56.0
PS C:\Users\larry\md2pdf> python .\md2pdf.py doc/index.md -s .\style\paper-zh.less -r all
invalid literal for int() with base 16: ''
PS C:\Users\larry\md2pdf>

lar-ry avatar Sep 20 '22 01:09 lar-ry

I have two questions:

  1. Could you please share (by mail) the font file used in your document with version 55? It’s 新宋体.
  2. We need to have a traceback to know why WeasyPrint crashes. Could you please make your md2pdf.py script display this whole traceback?

liZe avatar Sep 23 '22 13:09 liZe

  1. It's SourceHanSansSC-Regular.otf and SourceHanSerifSC-Regular.otf: SourceHanSansSC-Regular.zip SourceHanSerifSC-Regular.zip
  2. whole traceback:
    PS C:\Users\larry\md2pdf> python .\md2pdf.py doc/index.md -s .\style\paper-zh.less -r all
    Traceback (most recent call last):
    File "C:\Users\larry\md2pdf\md2pdf.py", line 138, in <module>
        convert(args.markdown, args.style, args.reserve)
    File "C:\Users\larry\md2pdf\md2pdf.py", line 58, in convert
        html.write_pdf(file_name + ".pdf")
    File "C:\Users\larry\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\weasyprint\__init__.py", line 201, in write_pdf
        .write_pdf(
    File "C:\Users\larry\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\weasyprint\document.py", line 335, in write_pdf
        pdf = generate_pdf(
    File "C:\Users\larry\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\weasyprint\pdf\__init__.py", line 445, in generate_pdf
        pdf_fonts = build_fonts_dictionary(pdf, fonts, optimize_size)
    File "C:\Users\larry\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\weasyprint\pdf\fonts.py", line 83, in build_fonts_dictionary
        _build_bitmap_font_dictionary(
    File "C:\Users\larry\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\weasyprint\pdf\fonts.py", line 208, in _build_bitmap_font_dictionary
        bits = bin(int(data.hex(), 16))[2:]
    ValueError: invalid literal for int() with base 16: ''
    PS C:\Users\larry\md2pdf>
    

lar-ry avatar Sep 26 '22 07:09 lar-ry

Thanks a lot for this very useful answer.

The problem you have is caused by a bitmap font. Since version 56, WeasyPrint supports bitmap fonts, that’s why the rendering was working with version 55, using another font. Unfortunately, the font you shared is not bitmap, so it’s not the font causing the problem.

If you have some Python skills, it could be possible to save this font using a debugger (the content of the file is in font.file_content). Or maybe you know which bitmap fonts are installed on your system. Would you be interested in trying to find the font causing the problem?

liZe avatar Sep 26 '22 12:09 liZe

@lar-ry Is there a way we could help you solving this problem?

liZe avatar Oct 11 '22 07:10 liZe

I had to continue using version 55 because this font was required.

@lar-ry Is there a way we could help you solving this problem?

lar-ry avatar Oct 11 '22 07:10 lar-ry

I had to continue using version 55 because this font was required.

OK, I understand.

The bug is actually caused by another font, a bitmap font installed on your system. The problem is that we don’t know which font it is.

Do you have some programming skills? If so, I could tell you how to find the font, so that I can fix the problem. But if you don’t have programming skills or don’t want to spend more time on this, no problem 😁️, we can then close this issue.

liZe avatar Oct 11 '22 07:10 liZe

I want to try, what do I need to do?

lar-ry avatar Oct 11 '22 07:10 lar-ry

I want to try, what do I need to do?

Thank you!

Here’s what you can do:

  • Install WeasyPrint 56.1
  • Open C:\Users\larry\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\weasyprint\pdf\fonts.py
  • Add open('C:\\Users\\larry\\font-' + font.hash, 'wb').write(font.file_content) on line 40. You should have something like:
    for font in fonts.values():
        open('C:\\Users\\larry\\font-' + font.hash, 'wb').write(font.file_content)
        widths = pydyf.Array()
  • Generate your document (you’ll have the bug, it’s normal).

You should now have one or more files named font-XXXXXX (with uppercase characters instead of X) created in your C:\Users\larry\ folder. You can attach them in a comment or send them by mail, that’s the file I need!

liZe avatar Oct 11 '22 08:10 liZe

fonts.zip I did it, and it created 3 files in fonts.zip

lar-ry avatar Oct 11 '22 08:10 lar-ry

Font files:

  • Source Han Sans SC
  • Source Han Serif SC
  • 宋体

PS: You are really professional, thank you.

lar-ry avatar Oct 11 '22 08:10 lar-ry

Thank you very much for your time, I can now reproduce your bug! I’ll come back as soon as I find a way to fix it.

liZe avatar Oct 11 '22 08:10 liZe

The bug should now be fixed. The 宋体 font includes both bitmap characters and vector characters. We should use the vector variant (see #1736), removing the bug.

Moreover, the crash happened when characters have no data (ie. they have a width of 0). If we ever meet these characters in bitmap-only fonts again, we’ll now correctly include them.

liZe avatar Oct 11 '22 11:10 liZe