提取中文pdf出现乱码
Description of the bug
Pythonܔزၥᦶຝҁ෫႕҂
ܔزၥᦶ༷ᬿ
Pythonၥᦶຝ
կຝጱଶ᧔҅ၥᦶ๋᯿ᥝጱྍṈฎࣁկݎጱײኴفྲঅ҅ಅզࣁ๗ၥᦶጱኴف҅
կᕪၧጱଶӤ᧔҅ݎሿጱᳯ᷌ᥴ٬౮֗҅ಭفጱᩒრྲ̶ࢩྌ҅Ӟӻၥᦶጱᔮᕹ҅
ত๋֯ጱၥᦶ੪ฎრդᎱᕆڦጱၥᦶ҅Ԟ੪ฎܔزၥᦶᴤྦྷ҅ᬯӻᬦᑕԞᤩ౮ԅጮፋၥᦶ̶ܔزၥᦶ
ฎ๋चԞฎ๋ବ੶ጱၥᦶᔄࣳ҅ܔزၥᦶଫአԭ๋चጱկդᎱ҅ইᔄ҅ڍහ̶ොဩᒵ҅ܔزၥᦶ
᭗ᬦݢಗᤈጱෙ༄ັᤩၥܔزጱᬌڊฎވჿ᪃ᶼ๗ᕮຎ̶ࣁၥᦶᰂਁरጱቘᦞӤ᧔҅᩼ஃӥጱၥᦶ
ಭفᩒრ᩼ṛ҅کጱࢧಸሲ᩼य़҅ᥠၥᦶᰂਁरཛྷࣳғ
ಲկຝጱ੶ᶎ҅ࣁᛔۖ۸ၥᦶጱ֛ᔮӾ҅ܔزၥᦶຝզ݊ܔزၥᦶጱᎣᦩ֛ᔮฎᶳᥝഩൎጱ
ದᚆԏӞ҅ܔزၥᦶጱᎣᦩ֛ᔮฎᛔۖ۸ၥᦶૡᑕզ݊ၥᦶݎૡᑕጱᎣᦩ֛ᔮԏӞ҅ᘒӬฎᶳ
ٍ॓ጱᎣᦩԏӞ̶ࣁPythonӾଫአ๋ଠာጱܔزၥᦶຝฎunittestpytest,unittestંԭຽٵପ҅
ݝᥝਞᤰԧPythonᥴ᯽ݸ੪ݢզፗളفֵአԧ,pytestฎᒫӣොጱପ҅ᵱᥝܔᇿጱਞᤰ̶ܔزၥᦶ
ຝጱᎣᦩ֛ᔮ੪ࢱᕰunittestpytestᦖᥴ̶
ጮፋၥᦶܻቘ pdf文件: Python单元测试框架.pdf
How to reproduce the bug
解析pdf文件出现乱码
PyMuPDF version
1.23.x or earlier
Operating system
Linux
Python version
3.11
Please describe in English!
Please describe in English!
Please describe in English! Using this tool to parse PDF Chinese documents resulted in garbled characters. Could you please help me take a look? Thank you very much. PDF document: Python单元测试框架.pdf
This PDF is full of errors - see the following log during open:
import pymupdf
doc = pymupdf.open("Python (1).pdf")
print(pymupdf.TOOLS.mupdf_warnings())
format error: cannot recognize xref format
trying to repair broken xref
repairing PDF document
Bad or missing parent pointer in outline tree, repairing
... repeated 4 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 3 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 3 times...
Bad or missing prev pointer in outline tree, repairing
Bad or missing parent pointer in outline tree, repairing
... repeated 2 times...
Bad or missing prev pointer in outline tree, repairing
When then saving to just contain the first page, no PDF viewer or extraction tool can extract meaningful text.
doc.select([0])
doc.ez_save("page1.pdf")
This PDF is full of errors - see the following log during open:
import pymupdf doc = pymupdf.open("Python (1).pdf") print(pymupdf.TOOLS.mupdf_warnings()) format error: cannot recognize xref format trying to repair broken xref repairing PDF document Bad or missing parent pointer in outline tree, repairing ... repeated 4 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 3 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 3 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairingWhen then saving to just contain the first page, no PDF viewer or extraction tool can extract meaningful text.
doc.select([0]) doc.ez_save("page1.pdf")
https://github.com/pypdfium2-team/pypdfium2 This can be extracted. Can you help me take a look? Thank you very much
Sorry - as I wrote: this file has severe defects. Whether or not some tools may still be able to extract things despite of this is a matter outside the scope we can deal with.
Sorry - as I wrote: this file has severe defects. Whether or not some tools may still be able to extract things despite of this is a matter outside the scope we can deal with.
好的,Thank you very much
Sorry - as I wrote: this file has severe defects. Whether or not some tools may still be able to extract things despite of this is a matter outside the scope we can deal with.
This PDF is full of errors - see the following log during open:
import pymupdf doc = pymupdf.open("Python (1).pdf") print(pymupdf.TOOLS.mupdf_warnings()) format error: cannot recognize xref format trying to repair broken xref repairing PDF document Bad or missing parent pointer in outline tree, repairing ... repeated 4 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 3 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 3 times... Bad or missing prev pointer in outline tree, repairing Bad or missing parent pointer in outline tree, repairing ... repeated 2 times... Bad or missing prev pointer in outline tree, repairingWhen then saving to just contain the first page, no PDF viewer or extraction tool can extract meaningful text.
doc.select([0]) doc.ez_save("page1.pdf")
How can I determine whether this PDF has errors? Is there a corresponding API? Thank you very much
How can I determine whether this PDF has errors? Is there a corresponding API?
Some errors are already detected when the PDF is opened - like in this case, where the central cross reference (xref) table is broken. MuPDF will then try to repair things by generating a new xref table from walking through he full file. This is usually accompanied by error and warning messages. Some of those are written to the console, the full message are also stored in the area pymupdf.TOOLS.mupdf_warnings() - as shown.
Whether a repair had been tried can be determined by looking at doc.is_repaired.
Not all errors can be detected at open time though. Some will only be exhibited when certain information is extracted like text or during rendering the pages' visual appearance.
How can I determine whether this PDF has errors? Is there a corresponding API?
Some errors are already detected when the PDF is opened - like in this case, where the central cross reference (xref) table is broken. MuPDF will then try to repair things by generating a new xref table from walking through he full file. This is usually accompanied by error and warning messages. Some of those are written to the console, the full message are also stored in the area
pymupdf.TOOLS.mupdf_warnings()- as shown.Whether a repair had been tried can be determined by looking at
doc.is_repaired.Not all errors can be detected at open time though. Some will only be exhibited when certain information is extracted like text or during rendering the pages' visual appearance.
ok, Thank you very much!