aws-cloudformation-templates icon indicating copy to clipboard operation
aws-cloudformation-templates copied to clipboard

crhelper.py 'timeout' logger is persisted between executions

Open danieljamesscott opened this issue 6 years ago • 7 comments

The 'timeout' function of crhelper.py

https://github.com/awslabs/aws-cloudformation-templates/blob/master/community/custom_resources/python_custom_resource_helper/crhelper.py#L91

appears to persist across executions of the lambda. Output from my logs:

START RequestId: 3df43426-e759-11e8-aff6-b1e7e3385b5e Version: $LATEST
[34aadf4b-279d-4712-8371-41fcfd235262][2018-11-13 15:31:51,929][INFO] Handling event
[34aadf4b-279d-4712-8371-41fcfd235262][2018-11-13 15:31:51,929][INFO] Lambda RequestId: 3df43426-e759-11e8-aff6-b1e7e3385b5e CloudFormation RequestId: 34aadf4b-279d-4712-8371-41fcfd235262
... trimmed extra log lines
[34aadf4b-279d-4712-8371-41fcfd235262][2018-11-13 15:31:52,173][INFO] Update complete returning new physicial id 5beaee6831e2bc2eaf94adf1, old_id 5beae8d431e2bc2eaf94a80c
[34aadf4b-279d-4712-8371-41fcfd235262][2018-11-13 15:31:52,174][INFO] Completed successfully, sending response to cfn
[34aadf4b-279d-4712-8371-41fcfd235262][2018-11-13 15:31:52,438][INFO] CloudFormation returned status code: OK
END RequestId: 3df43426-e759-11e8-aff6-b1e7e3385b5e
REPORT RequestId: 3df43426-e759-11e8-aff6-b1e7e3385b5e	Duration: 509.22 ms	Billed Duration: 600 ms Memory Size: 128 MB	Max Memory Used: 31 MB
START RequestId: 415fe7fd-e759-11e8-9e47-abc0bdddf7ef Version: $LATEST
[0f0b6f2f-0b00-4284-8402-47a973dd5cd0][2018-11-13 15:31:57,054][INFO] Handling event
[0f0b6f2f-0b00-4284-8402-47a973dd5cd0][2018-11-13 15:31:57,055][INFO] Lambda RequestId: 415fe7fd-e759-11e8-9e47-abc0bdddf7ef CloudFormation RequestId: 0f0b6f2f-0b00-4284-8402-47a973dd5cd0
... trimmed extra log lines
[0f0b6f2f-0b00-4284-8402-47a973dd5cd0][2018-11-13 15:31:57,075][INFO] Completed successfully, sending response to cfn
[0f0b6f2f-0b00-4284-8402-47a973dd5cd0][2018-11-13 15:31:57,265][INFO] CloudFormation returned status code: OK
END RequestId: 415fe7fd-e759-11e8-9e47-abc0bdddf7ef
REPORT RequestId: 415fe7fd-e759-11e8-9e47-abc0bdddf7ef	Duration: 211.26 ms	Billed Duration: 300 ms Memory Size: 128 MB	Max Memory Used: 31 MB
START RequestId: 6a652871-e759-11e8-8441-65b8ac4aa8ef Version: $LATEST
[54fe2a17-c453-41a9-a46c-538e88ab7e0a][2018-11-13 15:33:05,877][INFO] Handling event
[54fe2a17-c453-41a9-a46c-538e88ab7e0a][2018-11-13 15:33:05,877][INFO] Lambda RequestId: 6a652871-e759-11e8-8441-65b8ac4aa8ef CloudFormation RequestId: 54fe2a17-c453-41a9-a46c-538e88ab7e0a
[34aadf4b-279d-4712-8371-41fcfd235262][2018-11-13 15:33:05,877][ERROR] Execution is about to time out, sending failure message
[0f0b6f2f-0b00-4284-8402-47a973dd5cd0][2018-11-13 15:33:05,880][ERROR] Execution is about to time out, sending failure message
[54fe2a17-c453-41a9-a46c-538e88ab7e0a][2018-11-13 15:33:05,881][INFO] Received a Update Request

As you can see, with my lambda timeout set to 10s, the third execution (the first two complete within 10 seconds) of the lambda contains log lines related to the first two requests. Does this mean that the background thread is suspended while the lambda is inactive and continues when the lambda next starts?

I believe that the code should remove the timer as a final step, after successful completion of the lambda. Would this be a good idea?

danieljamesscott avatar Nov 13 '18 15:11 danieljamesscott

I just saw that the finally block with t.cancel() was removed by this PR:

https://github.com/awslabs/aws-cloudformation-templates/commit/67d4f2a8244fee0d6c44c6788f2bd7e83ae25428#diff-caf0cd039f3147afdf4f64683b74dad7

Should it be restored?

danieljamesscott avatar Nov 13 '18 15:11 danieljamesscott

@danieljamesscott I agree, but whole code of crhelper is 💩, I'm working on and more clean version with hopefully some tests written!

1oglop1 avatar Nov 26 '18 10:11 1oglop1

That will help!

I've encountered some other issues to do with logging:

https://forum.serverless.com/t/python-lambda-logging-duplication-workaround/1585

Might be nice to include this fix too.

danieljamesscott avatar Nov 26 '18 10:11 danieljamesscott

This is my version, it's py3.6+ compatible, but easy to modify for older pythons https://github.com/1oglop1/aws-cloudformation-templates/blob/master/community/custom_resources/python_custom_resource_helper/crhelper.py

lambda_handler.py

from crhelper import cfn_handler
def create(event, content):
   """Create resource."""
    physical_resource_id = context.log_stream_name
    return physical_resource_id, {"Some": "Data"}
def update(event, context)
    """Update resource."""
    physical_resource_id = event['PhysicalResourceId']
    return physical_resource_id, {}
def delete(event):
    """Delete resource."""
    return event['PhysicalResourceId']
def lambda_handler(event, context):
    cfn_handler(event, context, create, update, delete)

In case you want detailed logging for boto and crhelper, I'd suggest to use common approach to logging config or simply add these 2 lines in your lambda handler

logging_level=logging.DEBUG
logging.getLogger("botocore").setLevel(logging_level)
logging.getLogger("crhelper").setLevel(logging_level)

You can always write your logic to handle env vars to have it customisable, but that should be your application work, not a library what crhelper should be.

1oglop1 avatar Nov 26 '18 14:11 1oglop1

@danieljamesscott I'd like to have your opinion and other who read this:

I was thinking what would be the best Delete behaviour I came up with 2 options:

  1. current state - delete must return physical_resource_id - the value is up on the developer to handle - and must be same as produced by create If developer fails to return correct PhysicalResourceId - stack will hang up
def delete(event, context):
    """Delete resource."""
    return event['PhysicalResourceId']
  1. delete will not return anything and send_cfn will use event['PhysicalResourceId'] behind the scenes - because it can be either value which came from create or automatic value context.log_stream_name - here the delete should not hang up as the event['PhysicalResourceId'] is always same as it came from create
def delete(event, context):
    """Delete resource."""
    pass 

1oglop1 avatar Nov 27 '18 11:11 1oglop1

2nd option is probably better? If we can avoid hanging stacks, that would be best.

danieljamesscott avatar Nov 27 '18 11:11 danieljamesscott

This seems to still be an issue? It's breaking my CustomResource because it will always return a FAILED status, even if it already returned a SUCCESS status resulting in it being stuck in an infinite loop.

Lewenhaupt avatar Dec 08 '21 22:12 Lewenhaupt

This implementation has been deprecated in favor of https://github.com/aws-cloudformation/custom-resource-helper which has a richer feature set, is cleaner to implement and can be installed as a pip module.

mleziva avatar Oct 06 '23 15:10 mleziva