endesive pdf.verify throws an UnicodeDecodeError on files modified with PyMuPDF

pdf.verify throws an UnicodeDecodeError on files modified with PyMuPDF

Open ksledz opened this issue 1 year ago • 1 comments

I experimented with PyMuPDF, endesive and some signed PDFs and noticed that endesive's verify function works on various modified PDFs at all (I first discovered it on PDF's with financial data, and then reproduced it on something generic as seen below) For example, using pdf-acrobat.pdf from endesive repo saved in the same directory as the script:

import pymupdf
doc = pymupdf.open('pdf-acrobat.pdf')
print(doc.get_sigflags())
page = doc[0]
rects = page.search_for("world")
page.add_highlight_annot(rects)
doc.save("output.pdf")

And then trying to verify it:

from endesive import pdf
data = open("output.pdf", "rb").read()
(hashok, signatureok, certok)= pdf.verify(data, None, None)
print("signature ok?", signatureok)
print("hash ok?", hashok)
print("cert ok?", certok)

Leaves a traceback:

UnicodeDecodeError                        Traceback (most recent call last)
Cell In[8], line 3
      1 from endesive import pdf
      2 data = open("output.pdf", "rb").read()
----> 3 (hashok, signatureok, certok)= pdf.verify(data, None, None)
      4 print("signature ok?", signatureok)
      5 print("hash ok?", hashok)

File ~/playground/.venv/lib/python3.12/site-packages/endesive/pdf/verify.py:14, in verify(pdfdata, certs, systemCertsPath)
     12 br = [int(i, 10) for i in pdfdata[start + 1 : stop].split()]
     13 contents = pdfdata[br[0] + br[1] + 1 : br[2] - 1]
---> 14 bcontents = bytes.fromhex(contents.decode("utf8"))
     15 data1 = pdfdata[br[0] : br[0] + br[1]]
     16 data2 = pdfdata[br[2] : br[2] + br[3]]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e in position 0: invalid start byte

Aug 21 '24 19:08 ksledz

Verification of the correctness of PDFs is very naive, in fact it does not exist, e.g. there is no check whether the given range covers the entire document, ..... If no error occurred then everything "should" be ok, but any error should be treated as fatal.

If you have time and desire, please add as many checks as you can - PR is welcome

Aug 24 '24 18:08 m32

I changed the verify function, what you wrote won't happen anymore.

I also added the PDFVierifier class which gives better control over the signature verification process (including signature verification in CSP and TSP)

Dec 15 '24 14:12 m32

endesive endesive copied to clipboard

pdf.verify throws an UnicodeDecodeError on files modified with PyMuPDF

endesive
endesive copied to clipboard