powertools-lambda-typescript icon indicating copy to clipboard operation
powertools-lambda-typescript copied to clipboard

Feature Request: include messages on failed batch job stack trace

Open revmischa opened this issue 2 years ago • 9 comments

Expected Behaviour

If an async batch job processor throws an error, I would expect to see the error message in my output.

Current Behaviour

I see this error but nothing else; the exception that was thrown is not visible in my logs

  Error: All records failed processing. See individual errors below.
       at Pc.clean (/home/cyber/dev/foo/node_modules/.pnpm/@[email protected]/node_modules/@aws-lambda-powertools/batch/lib/BasePartialBatchProcessor.js:23:19)
       at Pc.process (/home/cyber/dev/foo/node_modules/.pnpm/@[email protected]/node_modules/@aws-lambda-powertools/batch/lib/BasePartialProcessor.js:25:14)
       at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
       at processPartialResponse (/home/cyber/dev/foo/node_modules/.pnpm/@[email protected]/node_modules/@aws-lambda-powertools/batch/lib/processPartialResponse.js:10:5)
       at async file:///home/cyber/dev/foo/node_modules/.pnpm/[email protected]_@[email protected]/node_modules/sst/support/nodejs-runtime/index.mjs:148:16

There are no more individual errors below

Code snippet

import { Context, SQSBatchResponse, SQSEvent } from "aws-lambda"
import { BatchProcessor, EventType, processPartialResponse } from "@aws-lambda-powertools/batch"

const processor = new BatchProcessor(EventType.SQS)

export const handlerInner = async (event: SQSEvent, context: Context): Promise<SQSBatchResponse> => {
  return processPartialResponse(event, processRecord, processor, {
    context,
  })
}
const processRecord = async (event: SQSRecord) => {
  throw new Error("test error")
}

I would expect to see test error in my logs

Steps to Reproduce

Create a SQS queue, hook up the above handler, send a message, throw an exception in your processor handler. The error doesn't show up.

Possible Solution

Perhaps some change introduced in https://github.com/aws-powertools/powertools-lambda-typescript/issues/1647 ?

Powertools for AWS Lambda (TypeScript) version

latest

AWS Lambda function runtime

18.x

Packaging format used

npm

Notes

I'm using Serverless Stack but I don't think it should make any difference

revmischa avatar Oct 11 '23 04:10 revmischa

Hi @revmischa, thank you for reporting this. I'm looking into it and seeing if I can reproduce.

I'll report back here as soon as I have more info.

dreamorosi avatar Oct 11 '23 07:10 dreamorosi

I have been investigating the report and was not able to reproduce the bug in its entirety.

For context, when all the records in a batch fail to process the Batch Processing utility throws a FullBatchFailureError with message All records failed processing. See individual errors below. - this is the place in the code where the error is thrown, and this is where the error is defined.

When this happens in Lambda, at least in my test, the logs look like this:

START RequestId: 0638bcc9-5472-4a56-b723-3c63f09893e2 Version: $LATEST
2023-10-11T12:17:59.036Z	0638bcc9-5472-4a56-b723-3c63f09893e2	ERROR	Invoke Error 	{"errorType":"FullBatchFailureError","errorMessage":"All records failed processing. See individual errors below.","name":"FullBatchFailureError","recordErrors":[{"errorType":"FooError","errorMessage":"test error","name":"FooError","stack":["FooError: test error","    at j.processRecord (/lib/tracer.ts:54:9)","    at j.processRecord (/node_modules/@aws-lambda-powertools/batch/lib/BatchProcessor.js:10:39)","    at record (/node_modules/@aws-lambda-powertools/batch/lib/BasePartialProcessor.js:23:70)","    at Array.map (<anonymous>)","    at j.process (/node_modules/@aws-lambda-powertools/batch/lib/BasePartialProcessor.js:23:49)","    at processPartialResponse (/node_modules/@aws-lambda-powertools/batch/lib/processPartialResponse.js:10:21)","    at Runtime.handler (/lib/tracer.ts:69:10)","    at Runtime.handleOnceNonStreaming (file:///var/runtime/index.mjs:1147:29)"]}],"stack":["FullBatchFailureError: All records failed processing. See individual errors below.","    at j.clean (/node_modules/@aws-lambda-powertools/batch/lib/BasePartialBatchProcessor.js:23:19)","    at j.process (/node_modules/@aws-lambda-powertools/batch/lib/BasePartialProcessor.js:25:14)","    at processPartialResponse (/node_modules/@aws-lambda-powertools/batch/lib/processPartialResponse.js:10:5)"]}
END RequestId: 0638bcc9-5472-4a56-b723-3c63f09893e2
REPORT RequestId: 0638bcc9-5472-4a56-b723-3c63f09893e2	Duration: 338.86 ms	Billed Duration: 339 ms	Memory Size: 128 MB	Max Memory Used: 74 MB	Init Duration: 152.97 ms	
XRAY TraceId: 1-65269276-03fd8ecf684b12ec78bac512	SegmentId: 34e34d3d1ae588ca	Sampled: true	

If we isolate at the 2nd entry, which is the error, the message is:

Invoke Error 	{"errorType":"FullBatchFailureError","errorMessage":"All records failed processing. See individual errors below.","name":"FullBatchFailureError","recordErrors":[{"errorType":"FooError","errorMessage":"test error","name":"FooError","stack":["FooError: test error","    at j.processRecord (/lib/tracer.ts:54:9)","    at j.processRecord (/node_modules/@aws-lambda-powertools/batch/lib/BatchProcessor.js:10:39)","    at record (/node_modules/@aws-lambda-powertools/batch/lib/BasePartialProcessor.js:23:70)","    at Array.map (<anonymous>)","    at j.process (/node_modules/@aws-lambda-powertools/batch/lib/BasePartialProcessor.js:23:49)","    at processPartialResponse (/node_modules/@aws-lambda-powertools/batch/lib/processPartialResponse.js:10:21)","    at Runtime.handler (/lib/tracer.ts:69:10)","    at Runtime.handleOnceNonStreaming (file:///var/runtime/index.mjs:1147:29)"]}],"stack":["FullBatchFailureError: All records failed processing. See individual errors below.","    at j.clean (/node_modules/@aws-lambda-powertools/batch/lib/BasePartialBatchProcessor.js:23:19)","    at j.process (/node_modules/@aws-lambda-powertools/batch/lib/BasePartialProcessor.js:25:14)","    at processPartialResponse (/node_modules/@aws-lambda-powertools/batch/lib/processPartialResponse.js:10:5)"]}

which, once pretty-printed looks like this:

{
   "errorType":"FullBatchFailureError",
   "errorMessage":"All records failed processing. See individual errors below.",
   "name":"FullBatchFailureError",
   "recordErrors":[
      {
         "errorType":"FooError",
         "errorMessage":"test error",
         "name":"FooError",
         "stack":[
            "FooError: test error",
            "    at j.processRecord (/lib/tracer.ts:54:9)",
            "    at j.processRecord (/node_modules/@aws-lambda-powertools/batch/lib/BatchProcessor.js:10:39)",
            "    at record (/node_modules/@aws-lambda-powertools/batch/lib/BasePartialProcessor.js:23:70)",
            "    at Array.map (<anonymous>)",
            "    at j.process (/node_modules/@aws-lambda-powertools/batch/lib/BasePartialProcessor.js:23:49)",
            "    at processPartialResponse (/node_modules/@aws-lambda-powertools/batch/lib/processPartialResponse.js:10:21)",
            "    at Runtime.handler (/lib/tracer.ts:69:10)",
            "    at Runtime.handleOnceNonStreaming (file:///var/runtime/index.mjs:1147:29)"
         ]
      }
   ],
   "stack":[
      "FullBatchFailureError: All records failed processing. See individual errors below.",
      "    at j.clean (/node_modules/@aws-lambda-powertools/batch/lib/BasePartialBatchProcessor.js:23:19)",
      "    at j.process (/node_modules/@aws-lambda-powertools/batch/lib/BasePartialProcessor.js:25:14)",
      "    at processPartialResponse (/node_modules/@aws-lambda-powertools/batch/lib/processPartialResponse.js:10:5)"
   ]
}

the individual errors are in an array held in the recordErrors property which is part of the FullBatchFailureError; the items of this array are whatever error has been thrown by the record handler, in this case a FooError error with message equal to test error.

Aside from that, I do agree however that the current format is hard to read and consume non programmatically and even in Lambda.

When invoking the function in Lambda, besides the error that is logged in the CloudWatch, the error itself is summarized in the response which is more similar to what you're describing above (see screenshot): image

While looking at the issue, I have tried to find a way around it and we could potentially add the individual errors to the stack trace, which would make the response (& error) looks like this: image

Doing this however has downsides:

  • We are prioritizing readability of the response vs cost (we are now logging more data)
  • If the number of failed records in the batch is particularly high, the trace and response could be come too big and hit other limits like the one related to the response (we might mitigate this by logging only a fixed number of errors and leave others in the actual log)

In any case, looking at the error format and stack trace in OP under the "Current Behaviour" section, it appears that the code is not being run in Lambda, could you confirm this? If this is the case, it could be that the extra properties of the error are not getting logged.

dreamorosi avatar Oct 11 '23 13:10 dreamorosi

Okay thanks for looking into this. I'm using Serverless Stack which runs lambdas locally for development. I'll mention it to them. https://github.com/aws-powertools/powertools-lambda-typescript/blob/f33c4c5edeca42825e0eb7c0a1d12029c6558eca/packages/batch/src/BasePartialBatchProcessor.ts#L49 is this what throws the error with properties describing the batch item failures? and it's not printing out all the fields of the error object?

revmischa avatar Oct 11 '23 21:10 revmischa

Yes, that's the line that throws the error, and if you look at the error implementation you'll see the property that I mentioned.

Please let me know how it goes on their side and feel free to link the issue.

I'll leave the issue open here and if there's enough demand, the option of adding the errors in the stack trace is still available on our side.

dreamorosi avatar Oct 11 '23 22:10 dreamorosi

Related: https://github.com/sst/console/commit/dbacd376a9d437f68cf6e25e71940cfc499444d3

revmischa avatar Oct 24 '23 17:10 revmischa

Oh wow, that's cool!

Do you by any chance have a simple reproduction example or some bullet points on how I can test this?

dreamorosi avatar Oct 24 '23 17:10 dreamorosi

Sure, SST's console is free at console.sst.dev You can create a SST project and try it out with sst dev. Live local lambda development is really nice. https://sst.dev/

revmischa avatar Oct 24 '23 17:10 revmischa

Yep, I know SST, I was referring to whether you had a sample I could deploy, but no worries - I found this one and got set it up.

Do you know if the fix has been applied also to the old console, or just to the new one? I tested on the old one and still see it like this: image

I can't easily test the new one at the moment due to the cross-account IAM permission that it requires, so if you do, let me know if it worked for you though

dreamorosi avatar Oct 24 '23 19:10 dreamorosi

I don't have a repo sorry

revmischa avatar Oct 25 '23 05:10 revmischa

Going to mark this as resolved due to no updates - if the issue persists on our side, please reopen.

dreamorosi avatar Jul 30 '24 13:07 dreamorosi

⚠️ COMMENT VISIBILITY WARNING ⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

github-actions[bot] avatar Jul 30 '24 13:07 github-actions[bot]