WeasyPrint icon indicating copy to clipboard operation
WeasyPrint copied to clipboard

Don’t discard the whole content of blocks in blocks when their bottom padding/border overflows the page

Open drboone opened this issue 4 years ago • 15 comments

For a sample size of 1, v53.3 seems to not split paragraphs in orphan/widow handling? I have a two column layout where it moved a 10-line paragraph to the top of the second column, apparently since there was only a 9-line open space at the bottom of the first column. I don't really want to share the as-yet-unpublished doc here, but could make it available to developers privately.

drboone avatar Oct 20 '21 20:10 drboone

Actually, in looking further at this, v52.5 seems to do much the same thing. I just hadn't thought about it possibly being widow/orphan related.

drboone avatar Oct 20 '21 21:10 drboone

Hello,

The default value of widows and orphans is 2, meaning that the 10-line paragraph should be split in your case. Could you please share an example?

liZe avatar Oct 26 '21 13:10 liZe

I emailed a link to a zip file to your kozea email address.

drboone avatar Oct 26 '21 17:10 drboone

I emailed a link to a zip file to your kozea email address.

I don’t have this address anymore, could you please sent it to the one given in my profile instead?

liZe avatar Oct 27 '21 10:10 liZe

Done.

drboone avatar Oct 27 '21 15:10 drboone

I don’t know really why, but the problem is already fixed in the current master branch.

review_56748.pdf

Could you please check that it also works for you?

liZe avatar Nov 07 '21 09:11 liZe

I did this:

pip3 uninstall weasyprint
pip3 install -U html5lib tinycss2 fonttools cssselect2 Pillow Pyphen cffi pydyf pycparser webencodings brotli zopfli
pip3 install git+https://github.com/Kozea/WeasyPrint@ed20d94c49c986dacc9b71de9fa8322e666dbc7b

and then re-generated the pdf from the example kit I sent. In several places, it has no longer started a new column in a place that doesn't seem right, so that's an improvement. There is still one, at the bottom of page 15, column one, that I don't understand.

drboone avatar Nov 09 '21 19:11 drboone

I notice your version is different from the one I just generated. review_56748.pdf

drboone avatar Nov 09 '21 20:11 drboone

There is still one, at the bottom of page 15, column one, that I don't understand.

It’s caused by a limitation of our current (and naive) implementation of columns layout: we only break direct children of the columns, not their grandchildren. The situation will probably be improved in version 54 or 55, as we’re currently improving the different layouts (that’s why this bug is already solved 😄).

I notice your version is different from the one I just generated.

Yes, the text layout is a bit different because of small differences in text metrics, probably because of different versions or configurations of Fontconfig or Harfbuzz. The different renderings don’t look right or wrong, there’s nothing to be worried about in my opinion.

liZe avatar Nov 09 '21 20:11 liZe

It’s caused by a limitation of our current (and naive) implementation of columns layout: we only break direct children of the columns, not their grandchildren. The situation will probably be improved in version 54 or 55, as we’re currently improving the different layouts (that’s why this bug is already solved smile).

For my edification, could you explain what objects are children and grandchildren of the column in this instance? Also, do you have any sense of the timeframe for v54? Weeks? Months? Years?

Yes, the text layout is a bit different because of small differences in text metrics, probably because of different versions or configurations of Fontconfig or Harfbuzz. The different renderings don’t look right or wrong, there’s nothing to be worried about in my opinion.

Well, yes and no. I'm not worried about small differences like font metrics, and over all the document looks fine either way. But there are paragraphs that weren't split on pages 12 and 14 that appear to be examples of the same widow/orphan problem I originally reported.

drboone avatar Nov 09 '21 23:11 drboone

For my edification, could you explain what objects are children and grandchildren of the column in this instance?

The children of your columns are often div tags. When a div contains text, it can be split, but the blocks it includes (like the p tags for your footnotes) can’t be split.

Also, do you have any sense of the timeframe for v54?

Here are the open issues for version 54. The biggest issue, footnotes support, will be ready for the end of November. The version will thus probably be released in December.

Weeks? Months? Years?

Don’t be afraid. Our changelog can give you a pretty good hint: you won’t have to wait for years before getting a new version.

Well, yes and no. I'm not worried about small differences like font metrics, and over all the document looks fine either way. But there are paragraphs that weren't split on pages 12 and 14 that appear to be examples of the same widow/orphan problem I originally reported.

There were actually 2 different problems in your document:

  • In the main content of your documents, some column breaks were wrong. That was a bug that is now fixed.
  • In the footnotes, some column breaks are still wrong. This is caused by a limitation of our current column layout, and will be fixed later, in version 54 or in version 55.

liZe avatar Nov 10 '21 08:11 liZe

liZe, thank you!

drboone avatar Nov 10 '21 16:11 drboone

It appears I may be able to clean up the HTML I'm passing to weasyprint, to remove those basically extraneous <p> tags, which may help in the interim.

drboone avatar Nov 11 '21 00:11 drboone

Here is a small example showing what the problem is:

<style>
  @page { margin: 0; size: 3.5em }
  body { line-height: 1; orphans: 1; widows: 1; margin: 0 }
  p { margin: 0 }
  div { padding-bottom: 1em }
</style>
<body>
  abc def
  <div>
    <p>
      hij klm
      nop qrs
    </p>
  </div>
  tuv wxy
  zab
</body>

Here, the bottom padding of the div overflows. We should keep the first line of the p, but instead we discard the whole p block.

(Fun fact: this bug is caused by code that has been written in 2012.)

liZe avatar Nov 14 '21 09:11 liZe

046f27bf fixes the problem caused by the bottom padding of the div: when this padding overflows, the p is now re-rendered with a smaller space, and the end of the p is rendered on the next page (where the padding of the div will have enough space). The commit also fixes the problem given above.

But there’s one more problem, because your sample is a corner case of this corner case.

When the last child of a block has a bottom margin, and when the block has a border padding/border, we have to include adjoining margins that will not collapse with the next children (because of the padding/border). Currently, when this case occurs, and when the bottom of the page in where the child’s bottom margin should be, we discard the whole last child, instead of re-rendering it. Once more, the solution would be to render it again with less space, so that the beginning of the last child can be rendered on the current page, and the end of the last child (and its bottom margin, and its parent’s bottom padding/border) can be drawn on the next page.

Before fixing this problem, we have to check the rules given in the specification about margins overflowing pages.

liZe avatar Nov 14 '21 12:11 liZe

This bug is fixed with the recent improvements of multicolumn layout.

liZe avatar Sep 24 '22 04:09 liZe