PyMuPDF
PyMuPDF copied to clipboard
When extracting a numbered list, the result is not as expected.
Description of the bug
In the PDF, there is a list where the numbering starts at 0. However, in the blocks I received, these numbers have moved to the next line of the text.
expect:
01 体重指数增⾼
actual:
(59.562503814697266, 51.22419738769531, 151.6444549560547, 63.239990234375, '体重指数增⾼\n01\n', 1, 0)
How to reproduce the bug
import fitz
doc = fitz.open("2024_5_.pdf")
for page in doc:
blocks= page.get_text('blocks')
for block in blocks:
print(block)
PyMuPDF version
1.24.4
Operating system
Linux
Python version
3.12