endesive
endesive copied to clipboard
pdf.verify throws an UnicodeDecodeError on files modified with PyMuPDF
I experimented with PyMuPDF, endesive and some signed PDFs and noticed that endesive's verify function works on various modified PDFs at all (I first discovered it on PDF's with financial data, and then reproduced it on something generic as seen below) For example, using pdf-acrobat.pdf from endesive repo saved in the same directory as the script:
import pymupdf
doc = pymupdf.open('pdf-acrobat.pdf')
print(doc.get_sigflags())
page = doc[0]
rects = page.search_for("world")
page.add_highlight_annot(rects)
doc.save("output.pdf")
And then trying to verify it:
from endesive import pdf
data = open("output.pdf", "rb").read()
(hashok, signatureok, certok)= pdf.verify(data, None, None)
print("signature ok?", signatureok)
print("hash ok?", hashok)
print("cert ok?", certok)
Leaves a traceback:
UnicodeDecodeError Traceback (most recent call last)
Cell In[8], line 3
1 from endesive import pdf
2 data = open("output.pdf", "rb").read()
----> 3 (hashok, signatureok, certok)= pdf.verify(data, None, None)
4 print("signature ok?", signatureok)
5 print("hash ok?", hashok)
File ~/playground/.venv/lib/python3.12/site-packages/endesive/pdf/verify.py:14, in verify(pdfdata, certs, systemCertsPath)
12 br = [int(i, 10) for i in pdfdata[start + 1 : stop].split()]
13 contents = pdfdata[br[0] + br[1] + 1 : br[2] - 1]
---> 14 bcontents = bytes.fromhex(contents.decode("utf8"))
15 data1 = pdfdata[br[0] : br[0] + br[1]]
16 data2 = pdfdata[br[2] : br[2] + br[3]]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e in position 0: invalid start byte
Verification of the correctness of PDFs is very naive, in fact it does not exist, e.g. there is no check whether the given range covers the entire document, ..... If no error occurred then everything "should" be ok, but any error should be treated as fatal.
If you have time and desire, please add as many checks as you can - PR is welcome
I changed the verify function, what you wrote won't happen anymore.
I also added the PDFVierifier class which gives better control over the signature verification process (including signature verification in CSP and TSP)