Stirling-PDF icon indicating copy to clipboard operation
Stirling-PDF copied to clipboard

[Feature Request]: Add PDF Obfuscation Feature

Open OteJlo opened this issue 1 month ago • 2 comments

Feature Description

I would like to request a feature that allows PDF obfuscation. The idea is to transform the original PDF into an image-based representation while replacing the underlying text layer with randomized characters. This means that visually, the PDF looks identical to the original, but when someone selects and copies text, they only get meaningless characters instead of the real content.

This feature would help prevent data harvesting from confidential documents while still allowing the document to be shared for visual purposes.

Why is this feature valuable?

This feature is valuable for organizations that need to share sensitive documentation without exposing the actual text content to automated tools or copy-paste operations. It would significantly reduce the risk of data leaks through text extraction while maintaining the usability of the document for viewing purposes.

For example:

  • Sharing internal technical documentation with external partners without allowing text mining.
  • Publishing documents online where the visual layout matters but the text must remain protected.

Suggested Implementation

  • Convert each page of the PDF into an image (e.g., PNG or JPEG).
  • Overlay a hidden text layer with randomized characters matching the original text positions.
  • Ensure the resulting PDF preserves the original appearance but breaks text selection integrity.

Optional enhancements:

  • Allow configuration of the randomness level (e.g., replace with random letters, symbols, or a fixed placeholder).
  • Provide batch processing for multiple PDFs.

Additional Information

This feature would complement existing security measures and provide a lightweight alternative to full encryption when the goal is to prevent text extraction rather than restrict access.

No Duplicate of the Feature

  • [x] I have verified that there are no existing features requests similar to my request.

OteJlo avatar Nov 14 '25 08:11 OteJlo

Hi, thanks for the ticket. However, I have some personal notes on this:

I am very much against this. The reason is that firstly, "PDF obfuscation" already exists; that's what the Scanner Effect is for. But making a feature like this, where the PDF looks similar to a normal one for less tech-literate people, is NOT good. I don't (and probably other people in the PDF open-source world) want to deal with people showing up with obfuscated PDFs for us to debug.

In the past, AFAIK there have been PDF software programs that have done this, and in the long run, all they achieved was damaging the reputation of PDF software and damaging the general trust people have in us (developers) and in the file format. We want people to have a good experience using the software EVEN when the PDF they are using is less than ideal or an edge case. Making this feature would mean there would be a lot of PDFs in circulation breaking all sorts of software, where devs would need to respond and explain this to the person. I do not want this burden on anybody. I've already used PDFDebugger too much.

I won't close this, but I very strongly oppose it. We have to be responsible; in the features we make, we should make sure they are in the long-run a positive thing.

balazs-szucs avatar Nov 14 '25 09:11 balazs-szucs

Thank you for sharing your perspective and for explaining the concerns in detail. I completely understand the potential risks you mentioned regarding compatibility and the overall trust in the PDF ecosystem.

That said, I believe this feature could still be offered as an optional setting, clearly marked and disabled by default. The goal is not to create problematic PDFs but to address a very specific need: preventing automated text extraction while preserving the visual integrity of the document.

This functionality could be valuable in scenarios where confidentiality is critical, such as internal documentation or sensitive publications. Of course, it would be important to include clear warnings about the implications and best practices for using this feature.

In short, I think providing this option—while leaving the choice to the user—could be a positive addition. Ultimately, each organization should decide which solution best fits its requirements.

OteJlo avatar Nov 14 '25 13:11 OteJlo