powertools-lambda-typescript
powertools-lambda-typescript copied to clipboard
Feature request: ability to not throw error on full batch failure
Use case
[!note] This feature request comes from #1785
Currently when using the Batch Processor utility, if all the records in a batch are marked as failed the utility throws a BatchProcessingError
.
Taking into consideration that the utility is supposed to be used with partial failure reporting, a Lambda function that throws an error is functionally equal to a partial failure that reports all items as failed in the sense that all the items in that batch are retried as a result.
While we initially implemented this as an error to reflect the full batch failure in the operational metrics (i.e. function runtime errors), there are cases such as when processing small batches that this behavior can skew the metrics due to higher chances of a full batch to fail.
To accommodate these use cases, as well as those customers who simply want to avoid throwing an error, we should add a new throwOnFullBatchFailure
option to the utility that allows customers to opt out of the error throwing mechanism.
Solution/User Experience
I haven't spent a lot of time thinking on where the new option should land, but the two options that come to mind are either when initializing the processor:
const processor = new BatchProcessor(
EventType.SQS,
{
throwOnFullBatchFailure: false
});
or in the process function itself:
export const handler = async (
event: SQSEvent,
context: Context
): Promise<SQSBatchResponse> => {
return processPartialResponse(event, recordHandler, processor, {
context,
throwOnFullBatchFailure: false
});
};
Without looking at the internal implementation - which will inevitably inform the decision - I'm more inclined towards the second option for additional granularity (i.e. I might want to reuse the same BatchProcessor
across multiple routes) and also because we already have a configuration object parameter that we can extend.
Alternative solutions
No response
Acknowledgment
- [X] This feature request meets Powertools for AWS Lambda (TypeScript) Tenets
- [ ] Should this be considered in other Powertools for AWS Lambda languages? i.e. Python, Java, and .NET
Future readers
Please react with 👍 and your use case to help us understand customer demand.
As discussed with @dreamorosi , I want to support the introduction of a throwOnFullBatchFailure
option, for two reasons:
- throwing FullBatchFailureError has the unintended (and currently undocumented) side effect of making Lambda back off
- currently, if a user wants to disable this side effect, the processor class needs to be patched
Here's an example: this Lambda function is allowed to scale up to 100 concurrent executions, but it does not because it sees many FullBatchFailureErrors, which make it scale down. The official AWS docs are a little ambiguous and suggest that this behavior is not active when partial failure reporting is configured. In fact, it is active and behaves the same, i.e. Lambda concurrency is reduced when errors are thrown. This was confirmed by an AWS engineer.
If throwing FullBatchFailureError is disabled, e.g. by overwriting the right method:
processor.entireBatchFailed = (): boolean => false
Then the function will use its full concurrency.
You can debate if scaling up is actually useful when there are so many failed batches, but that is besides the point. It is a non-obvious behavior that is not present if you implement partial failure reporting without throwing a specific error, and the AWS docs do not suggest to implement such an error.
In addition to implementing the throwOnFullBatchFailure
option I would suggest to:
- explain this side effect in the main documentation of Powertools so that users can understand it
- consider changing the default value of the option to false in the next major version
I believe adding the option inside process
function is more appropriate, as we have done similarly for the skipGroupOnError
option.
Just to confirm, if throwOnFullBatchFailure
is not provided, we should maintain the current behavior of throwing the error, correct? @dreamorosi
By the way, you can assign this to me.
Hi @arnabrahman, yes the default behavior or when the option is not provided, should be the same as now.
I agree with where to put the new option as well, let's do it!
Thank you!
⚠️ COMMENT VISIBILITY WARNING ⚠️
This issue is now closed. Please be mindful that future comments are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.