OpenPDF icon indicating copy to clipboard operation
OpenPDF copied to clipboard

Throw exception instead of arbitrarily chopping long words

Open C-Dargel opened this issue 6 months ago • 1 comments

Is your feature request related to a problem? Please describe. If I see this correctly, BidiLine.processLine() handles placing of text in lines internally (e. g. for a ColumnText). If it encounters a long string that won't fit in the current line, it arbitrarily chops it into (at least) two pieces and fits the first in the current line, continuing with the rest in the next line. If my ColumnText would only fit one line, I could use the result of go() to determine if my string would overflow (NO_MORE_COLUMN). If it is just one word in a longer text that still fits entirely in my, say, ten possible lines, I have no way of knowing that something was possibly chopped until I see the result.

Describe the solution you'd like I would prefer a simple error message telling me the string longestwordimaginable does not fit the line width.

In my use case I would decrease the font size by one and try again until everything either fits in lines or I reach a given minimum font size limit (and then throw an exception to hand that information to the user).

Describe alternatives you've considered I first thought I could use a custom SplitCharacter implementation to get the desired result, but that did obviously not work. Hyphenation is not feasible because we have no way of knowing the language of the input text as of now.

Your real name Christopher Dargel

C-Dargel avatar Apr 29 '25 08:04 C-Dargel

@C-Dargel Are you able to provide a test case to recreate this happening. This will make it easier to test if an enhancement addresses your exact issue.

csimoes1 avatar Jun 18 '25 15:06 csimoes1

Sorry for answering late, I couldn't quite find the time until today. Now I could finally prepare a little something to illustrate what I mean. I created this small case, which demonstrates the issue:

public static void main(String[] args) {
    try (Document document = new Document()) {
      FileOutputStream stream = new FileOutputStream("test.pdf");
      PdfWriter writer = PdfWriter.getInstance(document, stream);
      document.open();
      PdfContentByte canvas = writer.getDirectContent();

      canvas.saveState();
      canvas.rectangle(10, 10, 100, 200);
      canvas.setLineWidth(0.1f);
      canvas.stroke();
      canvas.restoreState();

      canvas.saveState();
      ColumnText column = new ColumnText(canvas);
      column.setSimpleColumn(10, 10, 110, 210);
      column.addText(new Chunk("German is a language known for long word compositions. One such example is 'Desoxyribonukleinsäure', which is German for deoxyribonucleic acid."));
      column.go();
      canvas.restoreState();

    } catch (IOException e) {
      e.printStackTrace();
    }
  }

The additional rectangle is just to make the borders of the ColumnText visible. The result looks like this:

Image

You can clearly see how the word 'Desoxyribonukleinsäure' has been cut apart randomly in line 6. As initially stated, I would like to have control over exactly that behavior. For example being able to turn it off and get thrown an error, with which I could do the necessary things.

Thanks for your consideration!

C-Dargel avatar Jun 25 '25 09:06 C-Dargel

I don't think openpdf should throw exception in this case, because throwing an exception here would prevent creating the resulting PDF file, in some cases actually creating a PDF file is more important than word wrapping on long words.

andreasrosdal avatar Jun 25 '25 21:06 andreasrosdal

Pull requests with this are welcome:

Add an optional flag (e.g., setStrictWordWrapping(true)) that:

Throws an exception if a word doesn’t fit.

Or exposes metadata (e.g., wasWordChopped()) for post-checks.

This preserves default behavior while supporting more sophisticated use cases.

andreasrosdal avatar Jun 26 '25 06:06 andreasrosdal

I don't think openpdf should throw exception in this case, because throwing an exception here would prevent creating the resulting PDF file, in some cases actually creating a PDF file is more important than word wrapping on long words.

And not creating a PDF file is exactly what I would want instead of incorrectly splitting long words, making the result look bad. Hence my suggestion of the possibility to turn it off (not by default, mind you), so I could preserve behavior as is but give me the ability to react to the situation in whatever way necessary.

C-Dargel avatar Jun 26 '25 06:06 C-Dargel

Pull requests welcome

andreasrosdal avatar Jun 26 '25 14:06 andreasrosdal

Added a pull request for this issue. As you suggested @andreasrosdal I added a flag to ColumnText that defaults to current behavior, but if the flag setStrictWordWrapping(true) then it throws a StrictWordWrapException

csimoes1 avatar Jun 26 '25 19:06 csimoes1

First of all, thank you, that's great news! Is that flag usable for Cells as well, as they use a ColumnText internally?

C-Dargel avatar Jun 27 '25 06:06 C-Dargel

@C-Dargel, I don't know, maybe. You should try it out. This fix does address the test case you provided. Good luck.

csimoes1 avatar Jun 27 '25 15:06 csimoes1

If I manipulate the cell's column by setting the flag it should work when fitting text there. Again, thanks for addressing this so quickly!

C-Dargel avatar Jun 30 '25 06:06 C-Dargel