powertools-lambda-typescript icon indicating copy to clipboard operation
powertools-lambda-typescript copied to clipboard

Feature request: sequential async processing

Open revmischa opened this issue 1 year ago • 8 comments

Use case

Sometimes I have records that I want processed one at a time, but my processor function happens to be async.

Solution/User Experience

It would be nice to request sequential processing of records with async handlers.

Alternative solutions

No response

Acknowledgment

Future readers

Please react with 👍 and your use case to help us understand customer demand.

revmischa avatar Dec 21 '23 05:12 revmischa

Hi @revmischa thank you for taking the time to open this issue.

As we discussed in Discord, I think the request is valid and I remember hearing it from other customers in the past few weeks.

I'm adding this to the backlog so that it can be picked up.

If anyone is interested in contributing, please leave a comment so we dan discuss an implementation.

dreamorosi avatar Dec 21 '23 05:12 dreamorosi

I also have a use case where I have an async handler but need to process FIFO events sequentially.

I think it's a safe assumption that most every handler anyone will ever write will be async. How else will do you I/O otherwise? And what is the use of a handler that can't do I/O?

revmischa avatar Jan 12 '24 19:01 revmischa

That's very fair, we are focused on releasing v2 this & next week.

After that we'll be able to reprise working on new features for the existing utilities. This is one of the issues I'd like to pick up relatively soon.

dreamorosi avatar Feb 21 '24 11:02 dreamorosi

I can work on this next. I can see there is a section in the doc about Async processing,

*If your function is async returning a Promise, use BatchProcessor and processPartialResponse * 
 If your function is not async, use BatchProcessorSync and processPartialResponseSync

So, based on the PR description, do we now want to have the option to use BatchProcessorSync and processPartialResponseSync in an async function? It would be helpful if I could have some more context. @dreamorosi, whenever you are free.

arnabrahman avatar Jul 02 '24 14:07 arnabrahman

Also just curious, what is the use case for sync processing? You can't really do I/O without async right? So what use is a SQS processing function that can't do any I/O?

revmischa avatar Jul 02 '24 15:07 revmischa

Hi @arnabrahman - thank you for reviving the conversation on this feature request.

When we initially ported the Batch Processing utility from the Python version of Powertools for AWS Lambda, we did so mirroring their preferred patterns: meaning we made the synchronous & sequential processor the default, and the asynchronous & parallel one the alternative one.

In hindsight, this was a mistake because - as @revmischa points out - in modern Node.js working with async/await and promises is the de facto standard when dealing with I/O.

In the next release, and before the utility was considered generally available we corrected this and made the BatchProcessor asynchronous by default, and made the sync one the secondary one (BatchProcessorSync).

Currently the async processing only supports processing the items in the batch in parallel (implementation is here).

For example, today you can do this, which will call the recordHandler on each item in the batch in parallel:

import {
  BatchProcessor,
  EventType,
  processPartialResponse,
} from '@aws-lambda-powertools/batch';
import type { SQSRecord, SQSHandler } from 'aws-lambda';

const processor = new BatchProcessor(EventType.SQS);

const recordHandler = async (record: SQSRecord): Promise<void> => {
  // ... do your async processing
};
 
export const handler: SQSHandler = async (event, context) =>
  processPartialResponse(event, recordHandler, processor, {
    context,
  });

As part of this feature request we should allow customers to also use sequential processing, with a flag similar to this:

import {
  BatchProcessor,
  EventType,
  processPartialResponse,
} from '@aws-lambda-powertools/batch';
import type { SQSRecord, SQSHandler } from 'aws-lambda';

const processor = new BatchProcessor(EventType.SQS);

const recordHandler = async (record: SQSRecord): Promise<void> => {
  // ... do your async processing
};
 
export const handler: SQSHandler = async (event, context) =>
  processPartialResponse(event, recordHandler, processor, {
    context,
    processInParallel: false // new flag, name to be confirmed
  });

I'm not 100% sold on the name of the option being processInParallel, but with it I think we should convey the following:

  • by default, when using processPartialResponse & BatchProcessor items are processed in parallel - this is to maintain backwards-compatibility (we might decide to change the default to sequential in the next major version, but that's a separate discussion)
  • by using this new option I'm opting out of the default behavior and instead choosing to have the utility call my async record handler sequentially following the order of the items as they appear in the batch.

Consequentially, regardless of the name we choose for the option, we will have to modify the process() method in the BatchProcessor class to check the value of the new option, and when opted-out, call & await each promise sequentially.

Regarding the BatchProcessorSync and processPartialResponseSync, I don't think there will have to be any changes for this new feature to be added.

dreamorosi avatar Jul 08 '24 15:07 dreamorosi