WeasyPrint icon indicating copy to clipboard operation
WeasyPrint copied to clipboard

Windows weasyprint infinite loop depending on server load

Open kreuzberger opened this issue 3 years ago • 10 comments

Hi! This may be chasing unicorns but with with weasyprint on windows (56.1) i ran into following problems:

  • Weasyprinting my local files (index.html generated by sphinx) into pdf works reproducible fine if executed "serial" - build with one core out of 6.
  • Building several pdfs with weasyprint (e.g. as part of CI build using all cores on server) on all cores fails. Weasyprint.exe seems to run into an infinite loop (cpu core 100%) and must be hard killed. This occurs not on a distinct index.html, it occurs randomly on one of the files.

Why should this happen? Is there an internal threading issue here? What is a good practice for this use case? On Linux (Ubuntu/Centos) everything works fine without any issues. GTK is the newest available on windows. Python is 3.10.3

Currently only building with one core seems to work reliable. Or may the builds randomly fail, independent of the used cores?

kreuzberger avatar Oct 14 '22 18:10 kreuzberger

Why should this happen? Is there an internal threading issue here? What is a good practice for this use case?

I don’t know what happens. The best way to find what’s wrong is probably to configure logs and see the process taking 100% CPU is stuck.

Currently only building with one core seems to work reliable. Or may the builds randomly fail, independent of the used cores?

That’s strange. If you launch your different renderings in separate processes, it shouldn’t be possible to have a thread-safety problem. But I’ve never used WeasyPrint intensively on Windows, so I’m probably not the best person to find where the problem comes from.

liZe avatar Oct 15 '22 13:10 liZe

Here is one log from a successfull build: (called weasyprint.exe -d ..)

WARNING: Ignored `box-shadow: None` at 19:5, unknown property.
WARNING: Ignored `box-shadow: None` at 31:5, unknown property.
INFO: Step 3 - Applying CSS
INFO: Step 4 - Creating formatting structure
INFO: Step 5 - Creating layout - Page 1
INFO: Step 5 - Creating layout - Page 2
INFO: Step 5 - Creating layout - Page 3
INFO: Step 5 - Creating layout - Page 4
INFO: Step 5 - Creating layout - Page 5
INFO: Step 5 - Creating layout - Page 6
INFO: Step 5 - Creating layout - Page 7
INFO: Step 5 - Creating layout - Repagination #1
INFO: Step 5 - Creating layout - Page 1 (up-to-date)
INFO: Step 5 - Creating layout - Page 2
INFO: Step 5 - Creating layout - Page 3 (up-to-date)
INFO: Step 5 - Creating layout - Page 4 (up-to-date)
INFO: Step 5 - Creating layout - Page 5 (up-to-date)
INFO: Step 5 - Creating layout - Page 6 (up-to-date)
INFO: Step 5 - Creating layout - Page 7 (up-to-date)
INFO: Step 5 - Creating layout - Repagination #2
INFO: Step 5 - Creating layout - Page 1 (up-to-date)
INFO: Step 5 - Creating layout - Page 2
INFO: Step 5 - Creating layout - Page 3 (up-to-date)
INFO: Step 5 - Creating layout - Page 4 (up-to-date)
INFO: Step 5 - Creating layout - Page 5 (up-to-date)
INFO: Step 5 - Creating layout - Page 6 (up-to-date)
INFO: Step 5 - Creating layout - Page 7 (up-to-date)

kreuzberger avatar Oct 16 '22 08:10 kreuzberger

Here is one log from a successfull build: (called weasyprint.exe -d ..)

This output is quite common, assuming that you have something like a table of contents on page 2.

These logs would actually be useful when the problem happens, as it would tell us whether there’s a layout problem with some content always waiting to be rendered on the next page (we would have a lot of pages), or an endless loop in a step of the layout (the rendering would be stuck somewhere, before the end of the document).

liZe avatar Oct 16 '22 08:10 liZe

(You can configure logging to store the output in a file. You can even change the logging format to include the processes and threads numbers, it will be easier to debug afterwards.)

liZe avatar Oct 16 '22 08:10 liZe

I am not sure if i understood it right. 1). Instead of using weasyprint.exe i use now weasyprint as module with HTML(indexefile).write_pdf. 2). I set loglevel to DEBUG but still see not more output from weasyrpint

'2022-10-16 12:41:12,984 pid-tid 8436-2880 Step 5 - Creating layout - Page 1'
'2022-10-16 12:41:13,186 pid-tid 8436-2880 Step 5 - Creating layout - Page 2'
'2022-10-16 12:41:13,477 pid-tid 8436-2880 Step 5 - Creating layout - Page 3'
'2022-10-16 12:41:13,517 pid-tid 8436-2880 Step 5 - Creating layout - Page 4'
'2022-10-16 12:41:13,569 pid-tid 8436-2880 Step 5 - Creating layout - Page 5'
'2022-10-16 12:41:13,652 pid-tid 8436-2880 Step 5 - Creating layout - Page 6'
'2022-10-16 12:41:13,675 pid-tid 8436-2880 Step 5 - Creating layout - Page 7'
'2022-10-16 12:41:13,697 pid-tid 8436-2880 Step 5 - Creating layout - Page 8'
'2022-10-16 12:41:13,735 pid-tid 8436-2880 Step 5 - Creating layout - Page 9'
'2022-10-16 12:41:13,759 pid-tid 8436-2880 Step 5 - Creating layout - Page 10'
'2022-10-16 12:41:13,795 pid-tid 8436-2880 Step 5 - Creating layout - Page 11'
'2022-10-16 12:41:13,839 pid-tid 8436-2880 Step 5 - Creating layout - Page 12'
'2022-10-16 12:41:13,850 pid-tid 8436-2880 Step 5 - Creating layout - Page 13'
'2022-10-16 12:41:13,868 pid-tid 8436-2880 Step 5 - Creating layout - Page 14'
'2022-10-16 12:41:13,948 pid-tid 8436-2880 Step 5 - Creating layout - Page 15'
'2022-10-16 12:41:13,963 pid-tid 8436-2880 Step 5 - Creating layout - Page 16'
'2022-10-16 12:41:14,001 pid-tid 8436-2880 Step 5 - Creating layout - Page 17'
'2022-10-16 12:41:14,012 pid-tid 8436-2880 Step 5 - Creating layout - Page 18'
'2022-10-16 12:41:14,049 pid-tid 8436-2880 Step 5 - Creating layout - Page 19'
'2022-10-16 12:41:14,063 pid-tid 8436-2880 Step 5 - Creating layout - Page 20'
'2022-10-16 12:41:14,544 pid-tid 8436-2880 Step 5 - Creating layout - Page 21'
'2022-10-16 12:41:14,598 pid-tid 8436-2880 Step 5 - Creating layout - Page 22'
'2022-10-16 12:41:14,647 pid-tid 8436-2880 Step 5 - Creating layout - Page 23'
'2022-10-16 12:41:14,684 pid-tid 8436-2880 Step 5 - Creating layout - Page 24'
'2022-10-16 12:41:14,724 pid-tid 8436-2880 Step 5 - Creating layout - Page 25'
'2022-10-16 12:41:14,770 pid-tid 8436-2880 Step 5 - Creating layout - Page 26'
'2022-10-16 12:41:14,774 pid-tid 8436-2880 Step 5 - Creating layout - Page 27'
'2022-10-16 12:41:14,893 pid-tid 8436-2880 Step 5 - Creating layout - Page 28'
'2022-10-16 12:41:15,001 pid-tid 8436-2880 Step 5 - Creating layout - Page 29'

It sees that the page layout creation is still done inbetween the same thread. Should i add MORE Debug Informations? in Wich function? e.g make_page in page.py?

'2022-10-16 13:47:22,303 pid-tid 3064-4276 Step 5 - Creating layout - Page 1'
'2022-10-16 13:47:22,304 pid-tid 3064-4276 Step 5.1 - make_page - Page 1'
'2022-10-16 13:47:22,305 pid-tid 3064-4276 Step 5.1.1 - make_page -context ready 1'
'2022-10-16 13:47:22,305 pid-tid 3064-4276 Step 5.1.2 - make_page - Page footnotes'
'2022-10-16 13:47:22,305 pid-tid 3064-4276 Step 5.1.3 - make_page - start empty page'
'2022-10-16 13:47:22,305 pid-tid 3064-4276 Step 5.1.4 - make_page - context_out_of_flow'
'2022-10-16 13:47:22,373 pid-tid 3064-4276 Step 5.1.4 - make_page - root_box'
'2022-10-16 13:47:22,374 pid-tid 3064-4276 Step 5.1.5 - make_page - finisch block formating'
'2022-10-16 13:47:22,374 pid-tid 3064-4276 Step 5.1.6 - make_page - page mager'
'2022-10-16 13:47:22,375 pid-tid 3064-4276 Step 5.1.5 - make_page - child descendants'
'2022-10-16 13:47:22,375 pid-tid 3064-4276 Step 5.1.5.1 - make_page - child'
'2022-10-16 13:47:22,375 pid-tid 3064-4276 Step 5.1.5.1 - make_page - None'
'2022-10-16 13:47:22,375 pid-tid 3064-4276 Step 5.1.5.1 - make_page - child'
'2022-10-16 13:47:22,375 pid-tid 3064-4276 Step 5.1.5.1 - make_page - None'
'2022-10-16 13:47:22,375 pid-tid 3064-4276 Step 5.1.5.1 - make_page - child'
'2022-10-16 13:47:22,375 pid-tid 3064-4276 Step 5.1.5.1 - make_page - None'
'2022-10-16 13:47:22,375 pid-tid 3064-4276 Step 5.1.5.1 - make_page - child'
'2022-10-16 13:47:22,375 pid-tid 3064-4276 Step 5.1.5.1 - make_page - cover'
'2022-10-16 13:47:22,376 pid-tid 3064-4276 Step 5.1.5.1 - make_page - child'
'2022-10-16 13:47:22,376 pid-tid 3064-4276 Step 5.1.5.1 - make_page - None'
'2022-10-16 13:47:22,376 pid-tid 3064-4276 Step 5.1.5.1 - make_page - child'
'2022-10-16 13:47:22,376 pid-tid 3064-4276 Step 5.1.5.1 - make_page - None'
'2022-10-16 13:47:22,376 pid-tid 3064-4276 Step 5.1.5.1 - make_page - child'
'2022-10-16 13:47:22,376 pid-tid 3064-4276 Step 5.1.5.1 - make_page - None'
'2022-10-16 13:47:22,376 pid-tid 3064-4276 Step 5.1.5.1 - make_page - child'
'2022-10-16 13:47:22,376 pid-tid 3064-4276 Step 5.1.5.1 - make_page - None'
'2022-10-16 13:47:22,376 pid-tid 3064-4276 Step 5.1.5.1 - make_page - child'
'2022-10-16 13:47:22,376 pid-tid 3064-4276 Step 5.1.5.1 - make_page - None'
'2022-10-16 13:47:22,377 pid-tid 3064-4276 Step 5.1.5.1 - make_page - child'
'2022-10-16 13:47:22,377 pid-tid 3064-4276 Step 5.1.5.1 - make_page - None'
'2022-10-16 13:47:22,377 pid-tid 3064-4276 Step 5.1.6 - return make_page'
'2022-10-16 13:47:22,377 pid-tid 3064-4276 Step 5 - Creating layout - Page 2'
'2022-10-16 13:47:22,379 pid-tid 3064-4276 Step 5.1 - make_page - Page 2'
'2022-10-16 13:47:22,379 pid-tid 3064-4276 Step 5.1.1 - make_page -context ready 2'
'2022-10-16 13:47:22,379 pid-tid 3064-4276 Step 5.1.2 - make_page - Page footnotes'
'2022-10-16 13:47:22,379 pid-tid 3064-4276 Step 5.1.3 - make_page - start empty page'
'2022-10-16 13:47:22,380 pid-tid 3064-4276 Step 5.1.4 - make_page - context_out_of_flow

This seems to stuck in "page.py" inbetween somethere here

    for box, containing_block, skip_stack in context_out_of_flow:
        box.position_y = root_box.content_box_y()
        if box.is_floated():
            out_of_flow_box, out_of_flow_resume_at = float_layout(
                context, box, containing_block, positioned_boxes,
                positioned_boxes, 0, skip_stack)
        else:
            assert box.is_absolutely_positioned()
            out_of_flow_box, out_of_flow_resume_at = absolute_box_layout(
                context, box, containing_block, positioned_boxes, 0,
                skip_stack)
        out_of_flow_boxes.append(out_of_flow_box)
        if out_of_flow_resume_at:
            broken_out_of_flow[out_of_flow_box] = (
                box, containing_block, out_of_flow_resume_at)
                ```

kreuzberger avatar Oct 16 '22 11:10 kreuzberger

Instead of using weasyprint.exe i use now weasyprint as module with HTML(indexefile).write_pdf.

👍️

I set loglevel to DEBUG but still see not more output from weasyrpint

👍️

It sees that the page layout creation is still done inbetween the same thread. Should i add MORE Debug Informations? in Wich function? e.g make_page in page.py? This seems to stuck in "page.py" inbetween somethere here

That’s even more than what I was asking for, thanks!

Interesting. That’s a place where we could have a problem trying to always render the same block in an endless loop. Could you please check that this for loop is actually endless, by logging something in it? If it’s actually stuck there in an endless loop, it would be very useful to get your document (HTML, CSS, images…) to find what’s wrong. You can attach it in a comment or send it by mail if it contains sensible data.

liZe avatar Oct 16 '22 12:10 liZe

layout.zip Attached both modified files.

In "singlebuild_case" the block_layout function in make_page is called about 11 times. in "multibuild_case" the block layout calculation stocks after the 8 time. I assume an endless loop in there. I stop debugging here. Main question still is for me, why does this always build when e.g. called from commandline and not working if called from CI. The main index file should always be the same.

Maybe i find time next week to get deeper int the layout with debugging messages. Thanks for your support.

'2022-10-16 15:12:41,853 pid-tid 5172-7784 Step 5.1 - make_page - Page 10'
'2022-10-16 15:12:41,853 pid-tid 5172-7784 Step 5.1.1 - make_page -context ready 10'
'2022-10-16 15:12:41,853 pid-tid 5172-7784 Step 5.1.2 - make_page - Page footnotes'
'2022-10-16 15:12:41,853 pid-tid 5172-7784 Step 5.1.3 - make_page - start empty page'
'2022-10-16 15:12:41,853 pid-tid 5172-7784 Step 5.1.4 - make_page - context_out_of_flow'
'2022-10-16 15:12:41,853 pid-tid 5172-7784 Step 5.1.4.4 - make_page - end of '
'2022-10-16 15:12:41,854 pid-tid 5172-7784 Step 5.1.4.5 - make_page - block_level_layout '
'2022-10-16 15:12:41,854 pid-tid 5172-7784 Step x.1 - block.py - block_level_layout '
'2022-10-16 15:12:41,854 pid-tid 5172-7784 Step x.2 - block.py - resolve_percentages '
'2022-10-16 15:12:41,854 pid-tid 5172-7784 Step x.3 - block.py - collapse_margin '
'2022-10-16 15:12:41,854 pid-tid 5172-7784 Step x.4 - block.py - get_clearance '
'2022-10-16 15:12:41,854 pid-tid 5172-7784 Step x.4 - block.py - get_clearance finsihed '
'2022-10-16 15:12:41,854 pid-tid 5172-7784 Step x.5 - block.py - return '
'2022-10-16 15:12:41,854 pid-tid 5172-7784 Step x.1 - block.py - block_level_layout '
'2022-10-16 15:12:41,855 pid-tid 5172-7784 Step x.2 - block.py - resolve_percentages '
'2022-10-16 15:12:41,855 pid-tid 5172-7784 Step x.3 - block.py - collapse_margin '
'2022-10-16 15:12:41,855 pid-tid 5172-7784 Step x.4 - block.py - get_clearance '
'2022-10-16 15:12:41,855 pid-tid 5172-7784 Step x.4 - block.py - get_clearance finsihed '
'2022-10-16 15:12:41,855 pid-tid 5172-7784 Step x.5 - block.py - return '
'2022-10-16 15:12:41,855 pid-tid 5172-7784 Step x.1 - block.py - block_level_layout '
'2022-10-16 15:12:41,855 pid-tid 5172-7784 Step x.2 - block.py - resolve_percentages '
'2022-10-16 15:12:41,856 pid-tid 5172-7784 Step x.3 - block.py - collapse_margin '
'2022-10-16 15:12:41,856 pid-tid 5172-7784 Step x.4 - block.py - get_clearance '
'2022-10-16 15:12:41,856 pid-tid 5172-7784 Step x.4 - block.py - get_clearance finsihed '
'2022-10-16 15:12:41,856 pid-tid 5172-7784 Step x.5 - block.py - return '
'2022-10-16 15:12:41,856 pid-tid 5172-7784 Step x.1 - block.py - block_level_layout '
'2022-10-16 15:12:41,856 pid-tid 5172-7784 Step x.2 - block.py - resolve_percentages '
'2022-10-16 15:12:41,856 pid-tid 5172-7784 Step x.3 - block.py - collapse_margin '
'2022-10-16 15:12:41,857 pid-tid 5172-7784 Step x.4 - block.py - get_clearance '
'2022-10-16 15:12:41,857 pid-tid 5172-7784 Step x.4 - block.py - get_clearance finsihed '
'2022-10-16 15:12:41,857 pid-tid 5172-7784 Step x.5 - block.py - return '
'2022-10-16 15:12:41,857 pid-tid 5172-7784 Step x.1 - block.py - block_level_layout '
'2022-10-16 15:12:41,857 pid-tid 5172-7784 Step x.2 - block.py - resolve_percentages '
'2022-10-16 15:12:41,857 pid-tid 5172-7784 Step x.3 - block.py - collapse_margin '
'2022-10-16 15:12:41,857 pid-tid 5172-7784 Step x.4 - block.py - get_clearance '
'2022-10-16 15:12:41,857 pid-tid 5172-7784 Step x.4 - block.py - get_clearance finsihed '
'2022-10-16 15:12:41,858 pid-tid 5172-7784 Step x.5 - block.py - return '
'2022-10-16 15:12:41,858 pid-tid 5172-7784 Step x.1 - block.py - block_level_layout '
'2022-10-16 15:12:41,858 pid-tid 5172-7784 Step x.2 - block.py - resolve_percentages '
'2022-10-16 15:12:41,858 pid-tid 5172-7784 Step x.3 - block.py - collapse_margin '
'2022-10-16 15:12:41,858 pid-tid 5172-7784 Step x.4 - block.py - get_clearance '
'2022-10-16 15:12:41,858 pid-tid 5172-7784 Step x.4 - block.py - get_clearance finsihed '
'2022-10-16 15:12:41,858 pid-tid 5172-7784 Step x.5 - block.py - return '
'2022-10-16 15:12:41,858 pid-tid 5172-7784 Step x.1 - block.py - block_level_layout '
'2022-10-16 15:12:41,859 pid-tid 5172-7784 Step x.2 - block.py - resolve_percentages '
'2022-10-16 15:12:41,859 pid-tid 5172-7784 Step x.3 - block.py - collapse_margin '
'2022-10-16 15:12:41,859 pid-tid 5172-7784 Step x.4 - block.py - get_clearance '
'2022-10-16 15:12:41,859 pid-tid 5172-7784 Step x.4 - block.py - get_clearance finsihed '
'2022-10-16 15:12:41,860 pid-tid 5172-7784 Step x.5 - block.py - return '
'2022-10-16 15:12:41,860 pid-tid 5172-7784 Step x.1 - block.py - block_level_layout '
'2022-10-16 15:12:41,860 pid-tid 5172-7784 Step x.2 - block.py - resolve_percentages '
'2022-10-16 15:12:41,860 pid-tid 5172-7784 Step x.3 - block.py - collapse_margin '
'2022-10-16 15:12:41,860 pid-tid 5172-7784 Step x.4 - block.py - get_clearance '
'2022-10-16 15:12:41,860 pid-tid 5172-7784 Step x.4 - block.py - get_clearance finsihed '
'2022-10-16 15:12:41,860 pid-tid 5172-7784 Step x.5 - block.py - return '
ninja: build stopped: subcommand failed.

kreuzberger avatar Oct 16 '22 13:10 kreuzberger

Debuging into got more and more deeper. so giving it up here. Unfortunately i cannot give you my internal project documentation. So i tried to generate the issues from "public" available documents. Attached is an example from https://github.com/useblocks/sphinx-simplepdf.

i wrote an test.cmd x that launches x processes of weasyprint.exe on windows. My NUMBER_OF_PROCESSORS is 6, so i started with test.cmd 12. This launches 24 python processes and 12 instances of weasyprint. After some time (1-2 Minutes) the weasyprint processes disspear and VERFY OFTEN (maybe on first start of testcmd, at least on second start on my machine) there will be one process left running "forever". Maybe it could be helpfor debugging this process then. demo.zip

kreuzberger avatar Oct 17 '22 10:10 kreuzberger

Thank you, I’ll try on my Windows VM.

liZe avatar Oct 17 '22 11:10 liZe

As it seams not so easy to reproduce on your site, feel free to close it. I could also close it, but just waited for some results from your tests.

kreuzberger avatar Nov 07 '22 17:11 kreuzberger

After changing to msys installation of gtk runtime libs the error never occured any more. The documentation still mentions the GTK Runtime installer, but it seems using the msys packages from the github workflow scripts is much better. Error could not be reproduced in this setup.

kreuzberger avatar Apr 30 '24 11:04 kreuzberger

Thanks a lot for the comment. I’ve never been able to reproduce on my VM.

The documentation still mentions the GTK Runtime installer, but it seems using the msys packages from the github workflow scripts is much better.

Maybe we should propose this solution instead. The GTK installer is not maintained anymore.

liZe avatar Apr 30 '24 11:04 liZe