powertools-lambda-typescript icon indicating copy to clipboard operation
powertools-lambda-typescript copied to clipboard

Feature request: ability to not throw error on full batch failure

Open dreamorosi opened this issue 1 year ago • 3 comments

Use case

[!note] This feature request comes from #1785

Currently when using the Batch Processor utility, if all the records in a batch are marked as failed the utility throws a BatchProcessingError.

Taking into consideration that the utility is supposed to be used with partial failure reporting, a Lambda function that throws an error is functionally equal to a partial failure that reports all items as failed in the sense that all the items in that batch are retried as a result.

While we initially implemented this as an error to reflect the full batch failure in the operational metrics (i.e. function runtime errors), there are cases such as when processing small batches that this behavior can skew the metrics due to higher chances of a full batch to fail.

To accommodate these use cases, as well as those customers who simply want to avoid throwing an error, we should add a new throwOnFullBatchFailure option to the utility that allows customers to opt out of the error throwing mechanism.

Solution/User Experience

I haven't spent a lot of time thinking on where the new option should land, but the two options that come to mind are either when initializing the processor:

const processor = new BatchProcessor(
  EventType.SQS,
  {
    throwOnFullBatchFailure: false
});

or in the process function itself:

export const handler = async (
  event: SQSEvent,
  context: Context
): Promise<SQSBatchResponse> => {
  return processPartialResponse(event, recordHandler, processor, {
    context,
    throwOnFullBatchFailure: false
  });
};

Without looking at the internal implementation - which will inevitably inform the decision - I'm more inclined towards the second option for additional granularity (i.e. I might want to reuse the same BatchProcessor across multiple routes) and also because we already have a configuration object parameter that we can extend.

Alternative solutions

No response

Acknowledgment

Future readers

Please react with 👍 and your use case to help us understand customer demand.

dreamorosi avatar Feb 21 '24 17:02 dreamorosi

As discussed with @dreamorosi , I want to support the introduction of a throwOnFullBatchFailure option, for two reasons:

  • throwing FullBatchFailureError has the unintended (and currently undocumented) side effect of making Lambda back off
  • currently, if a user wants to disable this side effect, the processor class needs to be patched

Here's an example: this Lambda function is allowed to scale up to 100 concurrent executions, but it does not because it sees many FullBatchFailureErrors, which make it scale down. The official AWS docs are a little ambiguous and suggest that this behavior is not active when partial failure reporting is configured. In fact, it is active and behaves the same, i.e. Lambda concurrency is reduced when errors are thrown. This was confirmed by an AWS engineer.

image

If throwing FullBatchFailureError is disabled, e.g. by overwriting the right method: processor.entireBatchFailed = (): boolean => false

Then the function will use its full concurrency.

image

You can debate if scaling up is actually useful when there are so many failed batches, but that is besides the point. It is a non-obvious behavior that is not present if you implement partial failure reporting without throwing a specific error, and the AWS docs do not suggest to implement such an error.

In addition to implementing the throwOnFullBatchFailure option I would suggest to:

  • explain this side effect in the main documentation of Powertools so that users can understand it
  • consider changing the default value of the option to false in the next major version

486 avatar Feb 22 '24 10:02 486

I believe adding the option inside process function is more appropriate, as we have done similarly for the skipGroupOnError option.

Just to confirm, if throwOnFullBatchFailure is not provided, we should maintain the current behavior of throwing the error, correct? @dreamorosi

By the way, you can assign this to me.

arnabrahman avatar Jun 30 '24 05:06 arnabrahman

Hi @arnabrahman, yes the default behavior or when the option is not provided, should be the same as now.

I agree with where to put the new option as well, let's do it!

Thank you!

dreamorosi avatar Jun 30 '24 07:06 dreamorosi

⚠️ COMMENT VISIBILITY WARNING ⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

github-actions[bot] avatar Jul 09 '24 07:07 github-actions[bot]