gmail-processor icon indicating copy to clipboard operation
gmail-processor copied to clipboard

wip: add decrypt pdf action

Open MikeDabrowski opened this issue 1 year ago • 3 comments

Description

This PR intends to add decryptPDF action. It will take attached pdfs, store the original and decrypted in the chosen location.

It also adds new dependency @cantoo/pdf-lib fork that allows for pdf decrypting. This library uses promises, which makes decrypting the pdf async as well. The rest of the code is synchronous.

Fixes #355)

Type of change

Please delete options that are not relevant.

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [x] This change requires a documentation update
  • [ ] Other type of change (test, build, refactoring, ...)

How has this been tested?

Could take pwd protected file, decrypt it and then try to open without pwd. That plus similar test as for the store action. TODO

  • [ ] Test example A: exampleA.js
  • [ ] ... (add more, if required or delete this line)

Checklist

  • [ ] My code follows the style guidelines of this project
  • [ ] I have performed a self-review of my code
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] My changes generate no new warnings
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] New and existing tests pass locally with my changes

MikeDabrowski avatar Jul 16 '24 08:07 MikeDabrowski

@ahochsteger I started adding the feature to the gmail-processor and would like to ask for some guidance. Because pdf-lib uses promises I decided to create separate action, but in theory they could be merged together. In my initial test, promise usage had no visible side effects - but I was not relying on anything returned from the function. Async here could impact many other places.

MikeDabrowski avatar Jul 16 '24 08:07 MikeDabrowski

@MikeDabrowski thanks for the PR - so far it looks good to me for the start. I'll add some comments to the code to let you know how I usually do things in Gmail Processor, esp. to be able to do test automation (both locally during build but as well as using end-to-end tests directly on Google Apps Script).

For local testing using Jest tests I usually mock services provided by GAS like Utilities and make them available via the environment context like ctx.env.utilities. See the EnvProvider.ts to see what GAS services can be accessed through the environment context and are automatically provided as mocks for local Jest tests.

Give me some time to try it out myself and I'll give you some more guidance or maybe directly change some things myself in case I feel that it may be a bit tricky to solve.

ahochsteger avatar Jul 18 '24 05:07 ahochsteger

But I still have to think about how to support async functions in Gmail Processor actions though ... In case you've got some ideas I'm all ears ;-).

Just from the top of my head - the web IDE of GAS mentioned me at some point that top level await is available. So GAS have at least some async support built in already. When I was hacking the decrypting without gmail-processor I just made the outer function async and went with it.

I assume that doing the same here might not be so easy - this lib is fare more complex than I thought initially. However, even if you'd have to make every single fn async, I suppose it would still be usable. At least in the basic way, as described in the docs. The way I am using it is I just have an outer function processMails which calls the gmail-processor. Nothing more nothing else. I don't know if there are any other usages that would break if run would become async.

The other idea, the 'kinda works' idea, is to put the async stuff into separate action, just add then and leave it be. Just make sure that whatever and whenever it does its thing it won't impact any other process. But lets leave it as the last resort option, it is not really for production code :/

Thanks for the review, I'll try to carve some time to address it in the next few days

MikeDabrowski avatar Jul 18 '24 05:07 MikeDabrowski

Pull Request Test Coverage Report for Build 13615723321

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 47 of 103 (45.63%) changed or added relevant lines in 6 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.4%) to 89.922%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/lib/e2e/E2E.ts 0 4 0.0%
src/lib/actions/AttachmentActions.ts 15 67 22.39%
<!-- Total: 47 103
Totals Coverage Status
Change from base Build 13322142560: -0.4%
Covered Lines: 8712
Relevant Lines: 9537

💛 - Coveralls

coveralls avatar Feb 23 '25 09:02 coveralls

Have you figured a way to handle async functions ?

MikeDabrowski avatar Feb 23 '25 12:02 MikeDabrowski

@MikeDabrowski I was able to do update this PR and it is now in a functional state using an async custom action. Moving the async support into the library did not (yet) work, but I'm still investigating how it is possible to reduce the complexity on the usage side. It would be great, if you could give it a try and give feedback using the beta testing script id 1yhOQyl_xWtnGJn_bzlL7oA4d_q5KoMyZyWIqXDJX1SY7bi22_lpjMiQK with version HEAD. This is a working example that uses a simple encrypted PDF that is included in this PR at src/e2e-test/files/encrypted.pdf (uses "dry-run" mode, wjich you might want to change):

function decryptPdfRun() {
  const config = {
    description:
      "The action `custom.decryptAndStorePdf` decrypts and stores a PDF file.",
    settings: {
      markProcessedMethod: "mark-read",
    },
    global: {
      thread: {
        match: {
          query:
            "has:attachment -in:trash -in:drafts -in:spam after:{{date.now|formatDate('yyyy-MM-dd')}} is:unread subject:\"[GmailProcessor-Test] decryptPdf\"",
        },
      },
    },
    threads: [
      {
        match: {
          query: "subject:([GmailProcessor-Test] decryptPdf)",
        },
        attachments: [
          {
            description: "Process all attachments named 'encrypted*.pdf'",
            match: {
              name: "(?<basename>encrypted.*)\\.pdf$",
            },
            actions: [
              {
                name: "custom.decryptAndStorePdf",
                args: {
                  location:
                    "/GmailProcessor-Tests/e2e/advanced/{{message.date|formatDate('yyyy-MM-dd')}}/decrypted.pdf",
                  conflictStrategy: "replace",
                  password: "test",
                },
              },
            ],
          },
        ],
      },
    ],
  }

  const customActions = [
    {
      name: "decryptAndStorePdf",
      action: async (ctx, args) => {
        const location = args.location
        try {
          ctx.log.info(`decryptAndStorePdf(): location=${location}`)
          const attachment = ctx.attachment.object
          const base64Content = ctx.env.utilities.base64Encode(
            attachment.getBytes(),
          )
          ctx.log.info(`decryptAndStorePdf(): Loading PDF document ...`)
          const pdfDoc = await GmailProcessorLib.PDFDocument.load(
            base64Content,
            {
              password: args.password,
              ignoreEncryption: true,
            },
          )
          ctx.log.info(`decryptAndStorePdf(): Decrypt PDF content ...`)
          const decryptedContent = await pdfDoc.save()
          ctx.log.info(`decryptAndStorePdf(): Create new PDF blob ...`)
          const decryptedPdf = ctx.env.utilities.newBlob(
            decryptedContent,
            attachment.getContentType(),
            attachment.getName(),
          )
          ctx.log.info(
            `decryptAndStorePdf(): Store PDF file to '${location}' ...`,
          )
          return ctx.proc.gdriveAdapter.createFileFromAction(
            ctx,
            args.location,
            decryptedPdf,
            args.conflictStrategy,
            args.description,
            "decrypted PDF",
            "custom",
            "custom.decryptAndStorePdf",
          )
        } catch (e) {
          ctx.log.error(
            `Error while saving decrypted pdf to ${location}: ${e}`,
          )
          throw e
        }
      },
    },
  ]
  return GmailProcessorLib.run(config, "dry-run", customActions)
}

ahochsteger avatar Feb 24 '25 07:02 ahochsteger

To summarize, these topics I'd like to address before releasing it (added to the description as well):

  • [x] Hide async complexity in the lib itself and provide a built-in action to decrypt PDFs (if possible somehow)
  • [x] Review the direct exposal of PDFDocument using GmailProcessorLib and maybe encapsulate it in ctx.env.pdfDocument.
  • [ ] Investigate the possibility to move the (huge) pdf lib to a separate Google Apps Script Library to make it an optional dependency.
  • [x] Update the documentation with good examples

ahochsteger avatar Feb 24 '25 07:02 ahochsteger

@MikeDabrowski I was now able to fully integrate it as a async action attachment.storeDecryptedPdf that can be used this way:

  {
    "name": "attachment.storeDecryptedPdf",
    "args": {
      "location": "decrypted.pdf",
      "conflictStrategy": "replace",
      "password": "...",
    },
  }

I intentionally left all additional properties to store both the original and the decrypted version out to keep the implementation as simple as possible. The original version may stored by an additional attachment.store action anyway.

ahochsteger avatar Mar 02 '25 13:03 ahochsteger

@ahochsteger Brilliant work! Thank you for completing this.

I can confirm that version 35 is working for me in my 'production' case!

MikeDabrowski avatar Mar 03 '25 16:03 MikeDabrowski