typst icon indicating copy to clipboard operation
typst copied to clipboard

Whitespace handling in `Content::plain_text`

Open mattfbacon opened this issue 2 years ago • 10 comments

I initially noticed that for code like this:

= My heading \
that goes on two lines!

it would be in the PDF outline as My heading that goes on two lines!. Note the double space.

I traced this back to Content::plain_text. I think that this is not the correct output because if you were to join the two lines together by replacing "\\\n" with "", you would end up with only one space.

mattfbacon avatar Oct 25 '23 05:10 mattfbacon

I was just looking for another issue regarding content and whitespace and noticed that this one seems already to be fixed with typst 0.11.0 (2bf9f95d)

jgpr-code avatar Apr 16 '24 17:04 jgpr-code

No, I can confirm that the issue still occurs with multiple spaces in the outline text.

mattfbacon avatar Apr 16 '24 17:04 mattfbacon

image image

For me this does not look like that

jgpr-code avatar Apr 17 '24 05:04 jgpr-code

maybe you are using an older version of typst than me?

jgpr-code avatar Apr 17 '24 05:04 jgpr-code

Yeah the initial example was not entirely correct. Here was my test case:

#heading(level: 1, [My heading \
that goes on multiple lines])

mattfbacon avatar Apr 17 '24 07:04 mattfbacon

image

still works for me

jgpr-code avatar Apr 17 '24 08:04 jgpr-code

Outline showing "My heading  that goes on ..." (note the double space between "heading" and "that")

This is the issue. Notice the double space.

mattfbacon avatar Apr 17 '24 19:04 mattfbacon

this does not explain how you get to that state.

It would be easier to have a minimal example that can reproduce your issue, as shown above from the information here it is still not possible to reproduce it.

I suspect that you have some other interaction going on which might or might not be an actual problem.

jgpr-code avatar Apr 18 '24 07:04 jgpr-code

The minimal example is as given above. Look at the outline in a PDF viewer capable of showing it. There is a double space in the title.

You can show this in code by using pymupdf.

>>> import fitz
>>> doc = fitz.open('/tmp/a.pdf') # PDF generated from typst code given above.
>>> toc = doc.get_toc()
>>> print(toc)
[[1, 'My heading  that goes on multiple lines', 1]]

mattfbacon avatar Apr 20 '24 10:04 mattfbacon

Maybe there is a difference between a linebreak and an actual newline character in pdf? At least I finally understand your issue, but I don't know enough about pdf or typst to be able to help you there.

In Typst you could try to use the newline character directly with unicode

#outline()

#heading(level: 1, [
  My heading \
  that goes on multiple lines
])

#heading(level: 1, [
  This heading\u{000A}goes on multiple lines
])

loading the resulting document with fitz code like this:

import fitz

pdf_document = "quick_tests.pdf"  # Replace with your actual PDF file path
doc = fitz.open(pdf_document)

page1 = doc.load_page(0)
page1text = page1.get_text()
print("Text from PDF: ", page1text)

toc = doc.get_toc()
print(toc)

gives me output like that:

(.venv) C:\Users\26383\typst_issue>py test.py
Text from PDF:  Contents
My heading
that goes on multiple lines ..................................................................................................................................... 1
This heading
goes on multiple lines .............................................................................................................................................. 1
My heading
that goes on multiple lines
This heading
goes on multiple lines

[[1, 'My heading  that goes on multiple lines', 1], [1, 'This heading\ngoes on multiple lines', 1]]

jgpr-code avatar Apr 20 '24 14:04 jgpr-code