PyMuPDF icon indicating copy to clipboard operation
PyMuPDF copied to clipboard

Redacting results are not as expected in 1.24.x.

Open Marerc opened this issue 10 months ago • 7 comments

Description of the bug

Hi there! Thanks for the excellent software for manipulating PDF files.

I have encountered the same issue as #3375. I am using pymupdf to remove some information from a commerical report through code, as shown below:

from typing import List
from fitz import Rect, Page
import fitz
import os


def handlePDF(doc):
    content_footer_rect = Rect(0, 772, 595, 842) 
    content_header_rect = Rect(0, 0, 595, 70)    
    cover_footer_rect = Rect(0, 600, 595, 842)     
    cover_upper_rect = Rect(0, 0, 595, 140)        

    print(f"Handling PDF from Jianbo")
    print(f"Handling cover page")
    p0: Page = doc[0]
    p0.add_redact_annot(cover_upper_rect)
    p0.add_redact_annot(cover_footer_rect)
    p0.apply_redactions()

    print(f"Handling body")
    for page in doc:
        page.add_redact_annot(content_header_rect)
        page.add_redact_annot(content_footer_rect)
        page.apply_redactions()

    print(f"Handling tailpage")
    last_page = doc[len(doc) - 1]
    copyright_positions: List[Rect] = last_page.search_for("Copyright BGI All Rights Reserved")
    for rect in copyright_positions:
        last_page.add_redact_annot(Rect(0, rect.y0 - 10, 595, rect.y0 + 30)) 
        last_page.apply_redactions()
    return doc

folder = os.path.dirname(__file__)
handlePDF(fitz.open(os.path.join(folder, "test.pdf"))).save(os.path.join(folder, "output.pdf"))

image image

How to reproduce the bug

bug-report.tar.gz

  1. Download bug-report.tar.gz, extract, and cd to bug-report.

  2. Create a virtual python environment with pymupdf 1.24.1 or 1.24.0 installed.

  3. Activate the virtual environment and run python test.py, you will see result as I reported above.

PyMuPDF version

1.24.1

Operating system

Linux

Python version

3.11

Marerc avatar Apr 12 '24 09:04 Marerc

This has nothing to do with #3375. It seems to be a MuPDF issue. I will prepare a report to their system.

JorjMcKie avatar Apr 12 '24 11:04 JorjMcKie

This 1-page file contains a redaction annotation which will ruin the appearance when applied. problem.pdf

JorjMcKie avatar Apr 12 '24 11:04 JorjMcKie

MuPDF issue reference: https://bugs.ghostscript.com/show_bug.cgi?id=707733

JorjMcKie avatar Apr 12 '24 11:04 JorjMcKie

The MuPDF team has resolved the problem. The fix will be rolled out with one of the next versions.

JorjMcKie avatar Apr 12 '24 18:04 JorjMcKie

The MuPDF team has resolved the problem. The fix will be rolled out with one of the next versions.

Wow, @JorjMcKie, Thank you very much for your response and efforts to solve this problem. Waiting for the next release.

BTW, I have modified my issue comment to make it less grammar and spelling errors.

Marerc avatar Apr 15 '24 01:04 Marerc

This has been fixed in v1.24.2.

Hi, JorjMcKie. Thank you very much for your efforts resolving this problem.

Unfortunately, after I upgrade the pymupdf and rerun the code in https://github.com/pymupdf/PyMuPDF/issues/3376#issue-2239590902, PDF apperance is still ruined. :(

$ git clone https://github.com/pymupdf/PyMuPDF.git && cd PyMuPDF && pip install .
$ pip list | grep PyMuPDF
PyMuPDF            1.24.2
PyMuPDFb           1.24.1
$ cd bug-report && python test.py

Is it because that the upstream code has not been released?

Marerc avatar Apr 23 '24 01:04 Marerc

Sorry - my bad: the MuPDF fix has not been included in this version. We will have to wait ...

JorjMcKie avatar Apr 23 '24 12:04 JorjMcKie

Fixed in 1.24.3.