PyMuPDF icon indicating copy to clipboard operation
PyMuPDF copied to clipboard

When extracting a numbered list, the result is not as expected.

Open wencan opened this issue 1 year ago • 0 comments

Description of the bug

2024_5_.pdf

In the PDF, there is a list where the numbering starts at 0. However, in the blocks I received, these numbers have moved to the next line of the text.

expect:

01 体重指数增⾼

actual:

(59.562503814697266, 51.22419738769531, 151.6444549560547, 63.239990234375, '体重指数增⾼\n01\n', 1, 0)

Screenshot from 2024-05-21 02-00-12_

How to reproduce the bug

import fitz

doc = fitz.open("2024_5_.pdf")
for page in doc:
    blocks= page.get_text('blocks')
    for block in blocks:
        print(block)

PyMuPDF version

1.24.4

Operating system

Linux

Python version

3.12

wencan avatar May 20 '24 18:05 wencan