Feature request: BasePartialProcessor hides exceptions by default
Expected Behaviour
As a developer, I expect that any exceptions that propagate out of my handler to be logged by default with the typical stack trace, etc. That is standard Lambda behavior.
Current Behaviour
When using the Powertools batch wrapper/dispatcher process_partial_response any exceptions raised from my handler are not logged as exceptions. That is unless all of the messages in the batch raise an exception.
Code snippet
def failure_handler(self, record, exception: ExceptionInfo) -> FailureResponse:
"""
Keeps track of batch records that failed processing
Parameters
----------
record: Any
record that failed processing
exception: ExceptionInfo
Exception information containing type, value, and traceback (sys.exc_info())
Returns
-------
FailureResponse
"fail", exceptions args, original record
"""
exception_string = f"{exception[0]}:{exception[1]}"
entry = ("fail", exception_string, record)
logger.debug(f"Record processing exception: {exception_string}")
self.exceptions.append(exception)
self.fail_messages.append(record)
return entry
Possible Solution
In BasePartialProcessor.failure_handler the exception is logged with a log.debug. This is problematic because:
- lacks stack trace information
- Is not typically output to the log by default
- Requires including all DEBUG level logs in order to surface through logging configuration.
A call to log.exception(), perhaps here, would solve the problem nicely.
Steps to Reproduce
Using BatchProcessor and process_partial_response, throw an exception from one of the handler invocations in the batch.
Note that nothing is logged for the exception.
Powertools for AWS Lambda (Python) version
latest
AWS Lambda function runtime
3.9
Packaging format used
PyPi
Debugging logs
Thanks for opening your first issue here! We'll come back to you as soon as we can. In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link
Hey @BrandonLWhite! Thanks a lot for opening this issue - I really love when customers open issues to improve the developer experience.
I fully agree with your point about logging exceptions when records fail to increase observability. I think this would be really helpful for debugging, but I'm wondering about the best way to implement it.
I'm thinking we have two options:
- Make exception logging the default behavior
- Introduce a new flag where customers can choose whether we log exceptions or not
To be honest, I'm not sure if we should make this the default behavior. The reason I'm hesitant is that customers can already wrap their logic in try/except blocks when processing their records if they want custom exception handling. Something like this:
from __future__ import annotations
from dataclasses import dataclass
import json
from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.batch import BatchProcessor, EventType, process_partial_response
from aws_lambda_powertools.utilities.data_classes.sqs_event import SQSRecord
processor = BatchProcessor(event_type=EventType.SQS)
logger = Logger()
def record_handler(record: SQSRecord):
payload: str = record.body
if payload:
try:
item: dict = json.loads(payload)
logger.info(item)
except Exception as exc:
logger.exception(exc)
@logger.inject_lambda_context
def lambda_handler(event, context):
return process_partial_response(event=event, record_handler=record_handler, processor=processor)
I think my main concern is that if we introduce logger.exception by default, it might cause unexpected side effects in customers' observability stacks - like increased log volume, alert noise, or different log parsing behavior.
Maybe we could add an optional parameter like log_exceptions=True to the BatchProcessor? That way customers who want this behavior can opt-in, but we don't change the default behavior for existing users.
What do you think about that approach? I'd love to hear your thoughts.
I understand the conundrum. I do agree that it's prudent to have a flag log_exceptions. The decision is what to make the default:
False: Avoids the potential double-logging that you described, but then requires developers to take an extra step to achieve automatic exception logging which could arguably be viewed as expected out-of-the-box behavior.True: Default behavior aligns with "vanilla" lambda behavior. If you raise an exception from a Lambda handler, it gets logged with stack trace in CloudWatch.
The problem with logging exceptions (or the lack-thereof in this case) is that once you realize it isn't happening it is probably too late! You have failed SQS messages in a DLQ (for example) and nothing in the logs to help you troubleshoot.
My preference would be default to True. The way I see it, double-logging exceptions for a subset of users who are doing their own logging via try/except/log/raise is better than not logging anything for all other users.
Having said that, I could certainly live with defaulting to False. At my org, we'd just make sure to document the need to set it to True and probably add it to static analysis checks.
Lastly, you might consider defaulting it to False for now, with the intent to change the default to True in the future along with a mention in the docs. Then in a future version you flip it and call it out in the release notes. Perhaps you save it for the next major version.
suggestion if I may: add support to use customers logger instance in Batch, then log exception if there is one registered.
This meets customers halfway and sets a springboard to a future UX pattern to also bring metrics and tracer instance for increased instrumentation.
It helps reduce flags scope creep and reduce exposure to those creating 1PB+ of logs daily.
Even if you use logger.exception, the default handler is null as a Python convention (requiring extra steps).
suggestion if I may: add support to use customers logger instance in Batch, then log exception if there is one registered.
This meets customers halfway and sets a springboard to a future UX pattern to also bring metrics and tracer instance for increased instrumentation.
It helps reduce flags scope creep and reduce exposure to those creating 1PB+ of logs daily.
Even if you use logger.exception, the default handler is null as a Python convention (requiring extra steps).
Hi @heitorlessa, it's always amazing to read your suggestions here. Yes, this could be another option to allow the customer, instead of using the flag, to bring its own logger instance, so we can then check if it's not null to log the exception.
What do you think, @BrandonLWhite?
Yes, that would be fine.
I must point out though, that following through the points made, it still leads me to conclude that simply calling logger.exception() unconditionally is the most sensible choice.
- The current state is that exceptions are not logged at all (for partial failures).
- There is concern about blowing up logs for existing users if these exceptions start appearing in the log.
- There is concern about adding more flags to the lib.
- The out-of-the-box Python config doesn't actually output the logs. You have to make a setup call to get that happening.
Thus, my conclusion is that calling logger.exception() unconditionally is the best choice because:
- It adheres to the Principle of Least Astonishment. (In my opinion).
- Won't actually output anything by default.
- Doesn't require any new flags or parameters to be passed in.
- Outputs unhandled exceptions to the log if the user has configured standard Python logging output, which applies to all other libraries being used.
As an example, here is a boilerplate excerpt that is typical in Lambdas at my organization:
logging.basicConfig(level=logging.INFO)
# Turn down the logging level for spammy libs here:
logging.getLogger("boto3").setLevel(logging.WARNING)
logging.getLogger("botocore").setLevel(logging.WARNING)
In the worst-case scenario that there is a customer/user that is raising a large number (millions/billions) of unhandled exceptions daily as part of nominal behavior of their lambda, AND they have logging output configured (logging.basicConfig() or similar) then these exceptions would start inflating their logs.
My belief is that it is more likely that customer time lost due to not having error messages (stack traces) in the logs by default when Python logging has been configured outweighs the potential Cloudwatch cost increases that are possible in corner cases (certainly realistic ones though).
And there is still a path to turn off exception logging via the .setLevel() standard Python logging interface for these particular cases.
In summary, I think it is better for a library like this one to err on the side of surfacing error details by default, while providing a way to silence those error messages for use cases needing it.