Stirling-PDF icon indicating copy to clipboard operation
Stirling-PDF copied to clipboard

[Feature request] Improved auto redaction to remove text behind box without image conversion

Open ChrLau opened this issue 2 years ago • 5 comments

Hi,

I just installed Stirling-PDF (App Version: 0.15.1) via Docker and played around with it features a bit, and.. I absolutely love it! But when using the Auto Redact feature I had some ideas how it could possibly be improved.

The "Auto Redact" feature currently works in a way that the entered words/RegEx are replaced with a solid box and the whole PDF is converted to an image. As to prevent the selection of the text behind the box. While this is a rock-solid approve in terms of security it might make working with redacted PDFs harder as you can't select/copy any text anymore. (Yes, I'm aware the OCR feature exists. 😏 But not everyone uses Stirling or has access to such tools.)

Instead the following alternative redaction solutions came to my mind:

  1. Remove the redacted text, only place the solid box in its place (keep spacing, layout, etc.)
  2. Replace redacted text with the word REDACTED (like in those redacted, official documents shown in TV 😄), without any box at all.
  3. Combination of 1 and 2: Replace the redacted text with the word REDACTED and additionally place the box in its place

Sadly I don't know enough about the PDF file format to know if this is achievable or to estimate how work-intensive this will be. But nevertheless I wanted to share my idea.

Thanks!

ChrLau avatar Dec 05 '23 22:12 ChrLau

I wish I could do this and it is certainly a good idea!

But text editting with pdfs is hard, it has proven quite difficult to edit remove or replace text within PDFs and it's why despite all the features of Stirling you still can't just edit text

Frooodle avatar Dec 05 '23 23:12 Frooodle

I will keep this issue ticket open however as I do want to give it some more tries in future

Frooodle avatar Dec 05 '23 23:12 Frooodle

Yai! That's all I asked for. 😄

ChrLau avatar Dec 06 '23 00:12 ChrLau

Hi, any update on this feature ?

HugoFollic avatar Apr 11 '24 13:04 HugoFollic

Hey, I know this isn't a python project, but I know PyMuPDF can do this. Here is a code snippet:

import fitz
doc = fitz.open("invoice-7-2024-06-01.pdf")

for i in range(doc.page_count):
    print("Processing page %i" % i)
    page = doc.load_page(i)

    draft = page.search_for("a word to redact")

    for rect in draft:
        annot = page.add_redact_annot(rect)
        page.apply_redactions()
        page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE)
# then save the doc to a new PDF:
doc.save("new.pdf", garbage=3, deflate=True)

It redacts the text while maintaining the rest of the pdf as is

youcefs21 avatar Jun 02 '24 04:06 youcefs21