powertools-lambda-python icon indicating copy to clipboard operation
powertools-lambda-python copied to clipboard

feat(logger): add thread safe logging keys

Open SimonBFrank opened this issue 1 year ago • 12 comments

Issue number: #991

Summary

Changes

  • New logger methods:
    • append_thread_local_keys : add key to thread-local ContextVar
    • get_current_thread_keys: get all keys from thread-local ContextVar
    • remove_thread_local_keys: remove list of keys from thread-local ContextVar
    • clear_thread_local_keys: remove all keys from thread-local ContextVar
  • Add new logic to_extract_log_keys method to handle error when an additional logging key's value is not a string and the name conflicts with a reserved keys
  • clear_state in inject_lambda_context also clears thread-local keys

User experience

Before:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")


def process_payments(payment_id):
    logger.info("Start processing payment", extra={"payment_id": payment_id})
    ...
    logger.info("End processing payment", extra={"payment_id": payment_id})


with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

After:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")


def process_payments(payment_id):
    logger.clear_thread_local_keys()
    logger.append_thread_local_keys(payment_id=payment_id)
    logger.info("Start processing payment")
    ...
    logger.info("End processing payment")


with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change?: No, existing functionality isn't broken.RFC issue number:

Checklist:

  • [ ] Migration process documented
  • [ ] Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

SimonBFrank avatar Sep 08 '24 18:09 SimonBFrank

Thanks a lot for your first contribution! Please check out our contributing guidelines and don't hesitate to ask whatever you need. In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

boring-cyborg[bot] avatar Sep 08 '24 18:09 boring-cyborg[bot]

Assigned to @sthulb.

leandrodamascena avatar Sep 09 '24 15:09 leandrodamascena

@sthulb I have made some changes to my PR since the last batch of tests failed due to type checking and | not being supported in Python versions < 3.9. Can you please allow the additional test cases to run?

SimonBFrank avatar Sep 09 '24 20:09 SimonBFrank

@SimonBFrank We have paused all development and merging in to develop/v3 branches until v3 has gone GA, which is currently scheduled for the 23rd now.

We will review and merge after that. Thank you for this contribution.

sthulb avatar Sep 13 '24 09:09 sthulb

This is fine, just got to rebase it into v3

sthulb avatar Sep 27 '24 12:09 sthulb

I've done the work to rebase @SimonBFrank's commits on to the v3 branch. This is now fine to merge (assuming tests pass)

sthulb avatar Sep 27 '24 13:09 sthulb

Codecov Report

Attention: Patch coverage is 85.10638% with 7 lines in your changes missing coverage. Please review.

Project coverage is 96.15%. Comparing base (0ccd6b3) to head (9a7ed53). Report is 11 commits behind head on develop.

Files with missing lines Patch % Lines
aws_lambda_powertools/logging/formatter.py 81.57% 5 Missing and 2 partials :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5141      +/-   ##
===========================================
- Coverage    96.20%   96.15%   -0.05%     
===========================================
  Files          229      229              
  Lines        10761    10806      +45     
  Branches      2001     2006       +5     
===========================================
+ Hits         10353    10391      +38     
- Misses         322      327       +5     
- Partials        86       88       +2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Sep 27 '24 13:09 codecov[bot]

Hello @leandrodamascena, thanks for the detailed review. What is the release process for new code? Once it is merged and released, is it imediately accessible or will it get added to a batch or changes?

2 - I'm thinking of the name ...._thread_local... is that the best name? Maybe ...._thread_...? I see you used get_current_thread_keys and not get_current_thread_local_keys, typo or was it intentional? Because I really like having just thread in the name.

I am not attached to the naming convention. I find thead_local a little weird. I just followed the initial method signatures in the issue. I think replacing thread_local with thread_safe or just thread would totally be fine.

3 - The changes in the documentation are amazing, the examples are very real and someone can understand them easily. But I think this is a bit confusing for the customer who is trying to understand how to use keys with multithreading and not have a specific section for this. I'm thinking about adding a specific section about thread-safe keys and adding this new documentation there. This way we can create a callout at the beginning of the page talking about this and redirect the warnings about "non-thread safe" methods to this new section. I'll work on this on Monday.

Sounds good! 👍🏻 Hopefully my initial work is useful for you.

4 - We need to add additional tests for non-covered area like the change you did in the _extract_log_keys method. BTW, I really like that, it's improve the developer experience.

If I get time this weekend, I will add some more test cases.

5 - Even though this is working perfectly locally, we need to add some e2e tests to make sure we are having the same local behavior in the Lambda environment. We know that the concept of multithreading is universal and works in any environment, but it's better to be safe and having tests directly with a real function.

Same here. If I get time this weekend, I will add some more test cases.

SimonBFrank avatar Sep 28 '24 01:09 SimonBFrank

Hello @leandrodamascena, thanks for the detailed review. What is the release process for new code? Once it is merged and released, is it imediately accessible or will it get added to a batch or changes?

Hey @SimonBFrank our release process is as follows:

1 - We use Trunk-based development and we merge every new change into develop 2 - Every day (except weekends) at 8AM UTC we release an alpha version (check pypi) of this branch. So, customers can tests changes that were merged the day before and anticipate any test or problem. 3 - If we don't have any critical bug, regression or security issue that forces us to release immediately, we release a new production version of this branch every 15 days. This is not set in stone, but we try to maintain this consistency as much as possible.

2 - I'm thinking of the name ...._thread_local... is that the best name? Maybe ...._thread_...? I see you used get_current_thread_keys and not get_current_thread_local_keys, typo or was it intentional? Because I really like having just thread in the name.

I am not attached to the naming convention. I find thead_local a little weird. I just followed the initial method signatures in the issue. I think replacing thread_local with thread_safe or just thread would totally be fine.

I like the *_thread_safe_* name.

4 - We need to add additional tests for non-covered area like the change you did in the _extract_log_keys method. BTW, I really like that, it's improve the developer experience.

If I get time this weekend, I will add some more test cases.

Take your time and try to enjoy the weekend off. No rush to add these new tests (functional and e2e), I can help with that and we can try to include this change in our next release next week.

Thanks.

leandrodamascena avatar Sep 28 '24 09:09 leandrodamascena

Hi @SimonBFrank! I've been thinking about our customer experience all weekend, both in terms of our documentation (which we agree needs improvement) and our default behavior. I have some thoughts on this PR.

We're creating new methods that are thread-safe and calling them "local" or whatever name we call, but we're also keeping the old ones. This reminds me of the concept of function coloring in sync/async space. In our case, we're creating a similar situation with our thread-safe "local" methods and the older, non-thread-safe versions. This might make it tricky for developers to use them together, just like mixing "safe" and "non-safe" functions can be challenging.

This morning, I had a meeting with @dreamorosi. He asked some interesting questions:

  • Why don't we make the thread-safe behavior the default?
  • How is this slower than the non-thread-safe method?
  • What possible breaking changes could this cause for our customers?

These are good questions to think about. Let's break them down:

Why don't we make the thread-safe behavior the default?

Making thread-safe behavior the default might seem like a good idea, but it can have drawbacks. I didn't inspect the code yet, but it seems possible to do. The Python logging library is thread-safe by default, so the places we inherit from there like extra_keys (like logger.info and others) this is thread-safe by default.

How is this slower than the non-thread-safe method?

Thread-safe methods can be slower because they require extra steps to ensure they work correctly when multiple threads use them at the same time, and I created a test to test this (the code is below - just example). In the tests I didn't notice any regression regarding the performance, so, I don't this this is a problem.

Non-thread safe REPORT RequestId: e01e70e8-2948-4061-80ed-aff592bea6a0 Duration: 105267.26 ms Billed Duration: 105268 ms Memory Size: 1024 MB Max Memory Used: 784 MB

import json
import io
from aws_lambda_powertools import Logger

my_logger = Logger(stream=io.StringIO())

large_object = {
    'prop1': 'prop',
    'prop2': 'prop',
    'prop3': 'prop',
    'prop4': 'prop',
    'prop5': 'prop',
    'prop6': 'prop',
    'prop7': 'prop',
    'prop8': 'prop',
    'prop9': 'prop',
    'prop10': 'prop',
    'prop11': 'prop',
    'prop12': 'prop',
    'prop13': 'prop',
    'prop14': 'prop',
    'prop15': 'prop',
    'prop16': 'prop',
    'prop17': 'prop',
    'prop18': 'prop',
    'prop19': 'prop',
    'prop20': 'prop',
    'prop21': 'prop',
    'prop22': 'prop',
    'prop23': 'prop',
    'prop24': 'prop',
    'prop25': 'prop',
    'prop26': 'prop',
    'prop27': 'prop',
    'prop28': 'prop',
}

def lambda_handler(event, context):
    for i in range(1,100000):
        my_logger.append_keys(**large_object)
        my_logger.info("test")
        my_logger.info("test")
        my_logger.info("test")
        my_logger.info("test")
        my_logger.info("test")
        my_logger.info("test")
        my_logger.remove_keys(list(large_object.keys()))

Thread safe REPORT RequestId: 163aaa0d-b1e2-42ed-be31-40f848f07e01 Duration: 106993.14 ms Billed Duration: 106994 ms Memory Size: 1024 MB Max Memory Used: 790 MB

import json
import io
from aws_lambda_powertools import Logger

my_logger = Logger(stream=io.StringIO())

large_object = {
    'prop1': 'prop',
    'prop2': 'prop',
    'prop3': 'prop',
    'prop4': 'prop',
    'prop5': 'prop',
    'prop6': 'prop',
    'prop7': 'prop',
    'prop8': 'prop',
    'prop9': 'prop',
    'prop10': 'prop',
    'prop11': 'prop',
    'prop12': 'prop',
    'prop13': 'prop',
    'prop14': 'prop',
    'prop15': 'prop',
    'prop16': 'prop',
    'prop17': 'prop',
    'prop18': 'prop',
    'prop19': 'prop',
    'prop20': 'prop',
    'prop21': 'prop',
    'prop22': 'prop',
    'prop23': 'prop',
    'prop24': 'prop',
    'prop25': 'prop',
    'prop26': 'prop',
    'prop27': 'prop',
    'prop28': 'prop',
}


def lambda_handler(event, context):
    # TODO implement
    for i in range(1,100000):
        my_logger.append_thread_local_keys(**large_object)
        my_logger.info("test")
        my_logger.info("test")
        my_logger.info("test")
        my_logger.info("test")
        my_logger.info("test")
        my_logger.info("test")
        my_logger.remove_thread_local_keys(list(large_object.keys()))

How is this slower than the non-thread-safe method?

Changing to thread-safe methods could break some customers' existing code. For example, if they rely on the current behavior or performance of our non-thread-safe methods, switching to thread-safe versions might cause unexpected problems or slowdowns in their programs. We've seen that performance isn't an issue we can expect, but I don't know about behavior. One thing we can look at is our tests, if we don't need to make any changes to our tests, that means we're potentially not breaking anything.

What do you think about this? Do you have time to invest in some testing to make thread-safe the default and only behavior?

Again, thank you so much for investing time into this and I would love to hear your thoughts.

leandrodamascena avatar Sep 30 '24 15:09 leandrodamascena

@leandrodamascena Hey, thanks for the feedback. My responses may be delayed for the next month as my personal life is getting busy.

I think making the logger thread safe by default would not be what I would want as a feature. Let me explain through an example.

I have a Lambda that processes an order that has multiple items. Each item within an order is processed in a separate thread. When I start my function I used append_keys to add the order_id. Then, at the beginning of each thread I use append_thread_local_keys to add the item_id. Example code below:

Code:

import threading
from typing import List

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.typing import LambdaContext

logger = Logger()


def threaded_func(item_id: str):
    logger.append_thread_local_keys(item_id=item_id)
    logger.info("Collecting payment")


def lambda_handler(event: dict, context: LambdaContext) -> str:
    order_id: str = event["order_id"]
    item_ids: List[str] = event["item_ids"]

    logger.append_keys(order_id=order_id)

    for item in item_ids:
        threading.Thread(target=threaded_func, args=(item,)).start()

    return "Success"

Input:

lambda_handler({"order_id": "abc123", "item_ids": ["A1", "A2"]}, {})

Output:

{"level":"INFO","location":"threaded_func:12","message":"Collecting payment","timestamp":"2024-10-02 22:46:16,678-0400","service":"service_undefined","order_id":"abc123","item_id":"A1"}
{"level":"INFO","location":"threaded_func:12","message":"Collecting payment","timestamp":"2024-10-02 22:46:16,678-0400","service":"service_undefined","order_id":"abc123","item_id":"A2"}

This is great because I only need to add the order_id once in the beginning of the Lambda call and will be used everywhere. I can use append_keys as "global" logging keys. But, thread specific information (item_id) can be added to each thread. If we were to make append_keys be thread specific, then we would have to add it in every thread. While this is possible, I find this a lot less user friendly. Additionally, I would imagine making append_keys thread safe would break existing behavior which opens a whole new can of worms. Let me know your thoughts.

SimonBFrank avatar Oct 03 '24 02:10 SimonBFrank

Hi @SimonBFrank! Thanks for the feedback! Last week was very busy and I'll check it out this week to merge the PR.

leandrodamascena avatar Oct 07 '24 09:10 leandrodamascena

Hey @leandrodamascena, any updates on getting this reviewed and merged?

SimonBFrank avatar Oct 16 '24 00:10 SimonBFrank

Hey @leandrodamascena, any updates on getting this reviewed and merged?

Hey @SimonBFrank! I've had to deal with some unexpected internal reprioritizations this past week. I dropped the ball a bit here, but tomorrow I'll push some small changes and tests for us to merge, ok?

Thanks

leandrodamascena avatar Oct 16 '24 23:10 leandrodamascena

Update 21/10/2024:

  • New logger methods:
    • thread_safe_append_keys : add key to thread-local ContextVar
    • thread_safe_get_current_keys: get all keys from thread-local ContextVar
    • thread_safe_remove_keys: remove list of keys from thread-local ContextVar
    • thread_safe_clear_keys: remove all keys from thread-local ContextVar
  • Add new logic to_extract_log_keys method to handle error when an additional logging key's value is not a string and the name conflicts with a reserved keys
  • clear_state in inject_lambda_context also clears thread-local keys

leandrodamascena avatar Oct 21 '24 19:10 leandrodamascena

Awesome work, congrats on your first merged pull request and thank you for helping improve everyone's experience!

boring-cyborg[bot] avatar Oct 22 '24 08:10 boring-cyborg[bot]

Hi @SimonBFrank! Thanks for your patience and hard work on this PR. I sent a commit with some small changes, but basically I renamed the methods and refactored the documentation to make it more specific.

Please let me know if you have any observation before I ask the @sthulb and merge it.

Looks amazing to me. I will let me team know that this is ready to use. Thanks for the help!

SimonBFrank avatar Oct 23 '24 01:10 SimonBFrank