OpenPDF icon indicating copy to clipboard operation
OpenPDF copied to clipboard

LayoutProcessor with inline images generates incorrect text offsets

Open ivinv opened this issue 1 year ago • 1 comments

Describe the bug Adding a chunk containing an image to a paragraph leads to incorrect positioning of the lines.

To Reproduce

public static void test() throws Exception {
    Document document = new Document();
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("image.pdf"));
    writer.setInitialLeading(16.0f);
    document.open();
    LayoutProcessor.enableKernLiga();

    Paragraph p = new Paragraph();
    p.add(new Chunk("\n"));
    p.add(new Chunk("Test of several Chunks on one line: A͜ "));
    Image img = Image.getInstance("images/test.png");
    img.scaleToFit(80, 50);
    p.add(new Chunk(img, 0, 0));
    p.add(new Chunk("A̋C̀C̄C̆C̈"));
    p.add(new Chunk("C̈C̕C̣C̦C̨̆"));
    p.add(new Chunk(".\n"));
    p.add(new Chunk("Another line: S"));
    p.add(new Chunk("Ṣ̄ṣ̄Ṭ̄ṭ̄Ạ̈ạ̈Ọ̈ọ̈Ụ̄Ụ̈ụ̄ụ̈"));
    p.add(new Chunk("j́S̛̄s̛̄K̛"));
    p.add(new Chunk(".\n"));

    document.add(p);
    document.close();
}

Expected behavior It is possible to use inline images and combined Unicode characters in the same paragraph.

Screenshots test

System

  • OS: Fedora
  • Used Font: NotoSans

ivinv avatar Feb 16 '24 01:02 ivinv

Thank you for reporting. Pull requests to fix this is welcome.

andreasrosdal avatar Feb 16 '24 07:02 andreasrosdal

A new beta version of LayoutProcessor is available for test at https://github.com/vk-github18/OpenPDF-vk2. Inline images should work with this version.

I also changed src/test/java/com/lowagie/text/pdf/TextExtractTest.java with the assumption that spaces are not significant for PDF text extraction.

@ivinv Please report your test results.

vk-github18 avatar Mar 09 '24 11:03 vk-github18

I also changed src/test/java/com/lowagie/text/pdf/TextExtractTest.java with the assumption that spaces are not significant for PDF text extraction.

That is a weird assumption. In general spaces are significant for PDF text extraction.

Thus, either there is something special about the test file in question, or the test had been broken before, or you have a regression.

mkl-public avatar Mar 10 '24 08:03 mkl-public

See https://github.com/LibrePDF/OpenPDF/issues/1098

vk-github18 avatar Mar 10 '24 12:03 vk-github18

See #1098

Ah, so it's not a general assumption but merely referring to a single test and test file with questionable test text anyways. Ok.

mkl-public avatar Mar 10 '24 22:03 mkl-public

Sadly the test coverage in OpenPDF is not that good. I think some tests were just written, assuming the actual behavior is correct, and not expecting a really correct behavior.

asturio avatar Mar 16 '24 19:03 asturio