OpenPDF LayoutProcessor with inline images generates incorrect text offsets

Describe the bug Adding a chunk containing an image to a paragraph leads to incorrect positioning of the lines.

To Reproduce

public static void test() throws Exception {
    Document document = new Document();
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("image.pdf"));
    writer.setInitialLeading(16.0f);
    document.open();
    LayoutProcessor.enableKernLiga();

    Paragraph p = new Paragraph();
    p.add(new Chunk("\n"));
    p.add(new Chunk("Test of several Chunks on one line: A͜ "));
    Image img = Image.getInstance("images/test.png");
    img.scaleToFit(80, 50);
    p.add(new Chunk(img, 0, 0));
    p.add(new Chunk("A̋C̀C̄C̆C̈"));
    p.add(new Chunk("C̈C̕C̣C̦C̨̆"));
    p.add(new Chunk(".\n"));
    p.add(new Chunk("Another line: S"));
    p.add(new Chunk("Ṣ̄ṣ̄Ṭ̄ṭ̄Ạ̈ạ̈Ọ̈ọ̈Ụ̄Ụ̈ụ̄ụ̈"));
    p.add(new Chunk("j́S̛̄s̛̄K̛"));
    p.add(new Chunk(".\n"));

    document.add(p);
    document.close();
}

Expected behavior It is possible to use inline images and combined Unicode characters in the same paragraph.

Screenshots test

System

OS: Fedora
Used Font: NotoSans

Feb 16 '24 01:02 ivinv

Thank you for reporting. Pull requests to fix this is welcome.

Feb 16 '24 07:02 andreasrosdal

A new beta version of LayoutProcessor is available for test at https://github.com/vk-github18/OpenPDF-vk2. Inline images should work with this version.

I also changed src/test/java/com/lowagie/text/pdf/TextExtractTest.java with the assumption that spaces are not significant for PDF text extraction.

@ivinv Please report your test results.

Mar 09 '24 11:03 vk-github18

I also changed src/test/java/com/lowagie/text/pdf/TextExtractTest.java with the assumption that spaces are not significant for PDF text extraction.

That is a weird assumption. In general spaces are significant for PDF text extraction.

Thus, either there is something special about the test file in question, or the test had been broken before, or you have a regression.

Mar 10 '24 08:03 mkl-public

See https://github.com/LibrePDF/OpenPDF/issues/1098

Mar 10 '24 12:03 vk-github18

See #1098

Ah, so it's not a general assumption but merely referring to a single test and test file with questionable test text anyways. Ok.

Mar 10 '24 22:03 mkl-public

Sadly the test coverage in OpenPDF is not that good. I think some tests were just written, assuming the actual behavior is correct, and not expecting a really correct behavior.

Mar 16 '24 19:03 asturio

OpenPDF OpenPDF copied to clipboard

LayoutProcessor with inline images generates incorrect text offsets

OpenPDF
OpenPDF copied to clipboard