gotenberg icon indicating copy to clipboard operation
gotenberg copied to clipboard

feat: encrypt

Open gulien opened this issue 7 months ago • 6 comments

Closes #453.

@thoven87 for whatever reasons, all the engines are failing to encrypt if metadata are added in the same process 🤔 any idea why?

gulien avatar May 28 '25 13:05 gulien

Closes #453.

@thoven87 for whatever reasons, all the engines are failing to encrypt if metadata are added in the same process 🤔 any idea why?

@gulien I suppose this is the test that's failing? If that's the case, I don't see userPassword nor ownerPassword being set?

thoven87 avatar May 28 '25 14:05 thoven87

Yep that was the test (and similar ones). I’ve removed the userPassword as there is no decrypt option for now.

Still, the API is throwing a 500 when trying to add metadata and encryption. For all PDF engines 😬

gulien avatar May 28 '25 14:05 gulien

Yep that was the test (and similar ones). I’ve removed the userPassword as there is no decrypt option for now.

Still, the API is throwing a 500 when trying to add metadata and encryption. For all PDF engines 😬

I will take a look and see what I can find.

thoven87 avatar May 28 '25 17:05 thoven87

@gulien should we try encrypting a file first if metadata is requested and then add metadata after? I see that exifTool can read/write metadata to encrypted documents provided the password is passed in.

thoven87 avatar May 30 '25 13:05 thoven87

ok, I have a solution that I think works fine @gulien

// EncryptPdfStub adds password protection to PDF files.
func EncryptPdfStub(ctx *api.Context, engine gotenberg.PdfEngine, userPassword, ownerPassword string, inputPaths []string) error {
	if userPassword == "" {
		return nil
	}

	for _, inputPath := range inputPaths {
		err := engine.Encrypt(ctx, ctx.Log(), inputPath, userPassword, ownerPassword)
		if err != nil {
			ctx.Log().Warn(fmt.Sprintf("PDF encryption failed for '%s' - this is often due to incompatibility between metadata engines (ExifTool) and encryption engines (QPDF/pdfcpu). Consider using metadata OR encryption, not both in the same request.", inputPath), zap.Error(err))
			return fmt.Errorf("encrypt PDF '%s': %w", inputPath, err)
		}
	}

	return nil
}

// EncryptPdfStubRobust adds password protection to PDF files with fallback strategies for metadata conflicts.
func EncryptPdfStubRobust(ctx *api.Context, engine gotenberg.PdfEngine, userPassword, ownerPassword string, inputPaths []string, metadata map[string]interface{}) error {
	if userPassword == "" {
		return nil
	}

	// If no metadata, just encrypt normally
	if len(metadata) == 0 {
		return EncryptPdfStub(ctx, engine, userPassword, ownerPassword, inputPaths)
	}

	// Strategy 1: Try encrypting after metadata has been applied (current approach)
	err := EncryptPdfStub(ctx, engine, userPassword, ownerPassword, inputPaths)
	if err == nil {
		ctx.Log().Info("Successfully applied both metadata and encryption using standard approach")
		return nil
	}

	ctx.Log().Warn("Standard metadata-then-encrypt approach failed, trying fallback strategies", zap.Error(err))

	// Strategy 2: Try encrypt-first approach using temporary files
	for _, inputPath := range inputPaths {
		err := encryptThenMetadataFallback(ctx, engine, inputPath, userPassword, ownerPassword, metadata)
		if err != nil {
			return fmt.Errorf("encrypt PDF '%s' using fallback strategies: %w", inputPath, err)
		}
	}

	ctx.Log().Info("Successfully applied both metadata and encryption using encrypt-first fallback strategy")
	return nil
}

// encryptThenMetadataFallback tries to encrypt first, then apply metadata
func encryptThenMetadataFallback(ctx *api.Context, engine gotenberg.PdfEngine, inputPath, userPassword, ownerPassword string, metadata map[string]interface{}) error {
	// Create a backup of the original file
	backupPath := inputPath + ".backup"
	err := copyFile(inputPath, backupPath)
	if err != nil {
		return fmt.Errorf("create backup file: %w", err)
	}
	defer func() {
		// Clean up backup file
		_ = os.Remove(backupPath)
	}()

	// Try encrypt-first approach
	err = engine.Encrypt(ctx, ctx.Log(), inputPath, userPassword, ownerPassword)
	if err != nil {
		// Restore from backup and return error
		_ = copyFile(backupPath, inputPath)
		return fmt.Errorf("encrypt-first approach failed: %w", err)
	}

	// Now try to apply metadata to the encrypted file
	err = engine.WriteMetadata(ctx, ctx.Log(), metadata, inputPath)
	if err != nil {
		ctx.Log().Warn("Could not apply metadata to encrypted PDF, keeping encrypted PDF without metadata", zap.Error(err))
		// We still have an encrypted PDF, which is better than nothing
		return nil
	}

	return nil
}

// copyFile copies a file from src to dst
func copyFile(src, dst string) error {
	srcFile, err := os.Open(src)
	if err != nil {
		return err
	}
	defer srcFile.Close()

	dstFile, err := os.Create(dst)
	if err != nil {
		return err
	}
	defer dstFile.Close()

	_, err = io.Copy(dstFile, srcFile)
	if err != nil {
		return err
	}

	return dstFile.Sync()
}

all the routes use EncryptPdfStubRobust instead of EncryptPdfStub

curl -X POST \
  "http://localhost:3000/forms/chromium/convert/url" \
  -F "url=https://example.com" \
  -F 'metadata={"Author":"Test Author","Title":"Test Document"}' \
  -F "userPassword=test123" \
  -o test_output.pdf

does produce an encrypted pdf file with the specified metadata. What do you think? I am waiting to get PR running properly and merge hopefully soon so that I can push the current decrypt logic I have. I think in the future, a struct can be used to allow a user to set permission when encrypting a file as well.

thoven87 avatar May 30 '25 14:05 thoven87

Interesting! Wouldn't it make more sense to just always add the metadata after the encryption?

EDIT: not working actually, heh.

gulien avatar Jun 02 '25 08:06 gulien