powertools-lambda-python icon indicating copy to clipboard operation
powertools-lambda-python copied to clipboard

Feature request: Add persistent keys option to Logger

Open jth08527 opened this issue 10 months ago • 5 comments

Use case

Add custom keys to logs that should persist on every lambda execution and NOT be cleared when using clear_state.

This functionality already exists in the TypeScript version of powertools and can be useful for Python as well.

Solution/User Experience

Just like in the TypeScript version, add a new optional parameter to the Logger initializer. For example, the new option can be called persistent_keys.

This parameter could be a simple flat dict(or similar type?), and if not provided, default to an empty dict.

The value of this new parameter could either be added to self._default_log_keys(since what ever is in that already persists through a state clearing) or be maintained separately but treated just like self._default_log_keys.

Example init:

import os
from aws_lambda_powertools import Logger

logger = Logger(persistent_keys={"role": os.getenv("MY_ROLE", "MY_ROLE")

Example storing keys:

class Logger:
    ...
    def __init__(
        ...
        persistent_keys: dict[str, Any] = {},
        ...
    ) -> None:
        ...
        self._default_log_keys = {"service": self.service, "sampling_rate": self.sampling_rate}
        if persistent_keys:
            self._default_log_keys.update(persistent_keys)
        ...

Alternative solutions


Acknowledgment

jth08527 avatar Feb 03 '25 15:02 jth08527

Thanks for opening your first issue here! We'll come back to you as soon as we can. In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

boring-cyborg[bot] avatar Feb 03 '25 15:02 boring-cyborg[bot]

Hi @jth08527! Thanks a lot for opening this issue! I'll need to do some research on the impact of this issue because I don't know if the API will be confused with append_keys and persistent_keys. I'm adding this to our backlog and aiming to take a look at it later this month.

leandrodamascena avatar Feb 10 '25 11:02 leandrodamascena

Hey @leandrodamascena! 👋 I've done a comprehensive analysis of the potential API confusion between append_keys and persistent_keys that you mentioned in issue #6002. Here's my detailed investigation and recommendations.

🔍 Issue Summary

The request is to add a persistent_keys parameter to Logger (similar to TypeScript version) that would persist even when clear_state() is called, unlike the current append_keys() which gets cleared.

🎯 Current Python Implementation Analysis

Detailed Code Flow Analysis:

1. Logger Constructor (__init__)

def __init__(self, service=None, sampling_rate=None, **kwargs):
    self.service = resolve_env_var_choice(choice=service, env=os.getenv(constants.SERVICE_NAME_ENV, "service_undefined"))
    self.sampling_rate = resolve_env_var_choice(choice=sampling_rate, env=os.getenv(constants.LOGGER_LOG_SAMPLING_RATE))
    
    # These keys ALWAYS persist through clear_state()
    self._default_log_keys = {"service": self.service, "sampling_rate": self.sampling_rate}

2. append_keys() Method Chain:

# Logger.append_keys() - delegates to formatter
def append_keys(self, **additional_keys: object) -> None:
    self.registered_formatter.append_keys(**additional_keys)

# LambdaPowertoolsFormatter.append_keys() - actual implementation  
def append_keys(self, **additional_keys) -> None:
    self.log_format.update(additional_keys)  # Direct dict update

3. clear_state() Method Chain:

# Logger.clear_state() - resets and restores defaults
def clear_state(self) -> None:
    self.registered_formatter.clear_state()  # Clear formatter state
    self.structure_logs(**self._default_log_keys)  # Restore defaults

# LambdaPowertoolsFormatter.clear_state() - actual clearing
def clear_state(self) -> None:
    self.log_format = dict.fromkeys(self.log_record_order)  # Reset structure
    self.log_format.update(**self.keys_combined)  # Restore constructor keys

4. structure_logs() Method (Key Restoration):

def structure_logs(self, append: bool = False, formatter_options: dict | None = None, **keys) -> None:
    log_keys = {**self._default_log_keys, **keys}  # Merge defaults with new keys
    
    # Mode 3: Clear existing and add new keys (used by clear_state)
    if not append:
        self.registered_formatter.clear_state()
        self.registered_formatter.thread_safe_clear_keys()
        self.registered_formatter.append_keys(**log_keys)

Critical Current Behaviors:

  1. append_keys() → Directly updates formatter.log_format dict (temporary keys)
  2. clear_state() → Resets formatter, then restores _default_log_keys via structure_logs()
  3. _default_log_keys{"service": self.service, "sampling_rate": self.sampling_rate} - ALWAYS survive
  4. Keys precedence → Later keys override earlier ones (simple dict update)
  5. Thread safety → Uses ContextVar for thread-local temporary keys

Current State Management:

  • Logger Level: Tracks _default_log_keys only
  • Formatter Level: Manages all keys in log_format dict + thread-local ContextVar
  • No separation: All non-default keys treated as temporary
  • Clear behavior: Nuclear reset + selective restoration of defaults

Current Flow Example:

logger = Logger(service="payment")  # _default_log_keys = {"service": "payment", "sampling_rate": None}
logger.append_keys(user_id="123")   # formatter.log_format = {..., "user_id": "123"}
logger.append_keys(session="abc")   # formatter.log_format = {..., "user_id": "123", "session": "abc"}
logger.clear_state()                # Reset everything, restore only service/sampling_rate
# Result: Only service="payment" and sampling_rate persist

🚨 API Confusion Concerns (Validated!)

1. Naming & Semantic Confusion

# This is confusing for developers:
logger.append_keys(user_id="123")        # Temporary key
logger.persistent_keys = {"env": "prod"}  # Persistent key  
logger.clear_state()                     # Only clears user_id, keeps env

# Users won't intuitively understand the difference!

2. TypeScript vs Python Inconsistency

  • TypeScript: resetKeys() vs Python: clear_state()
  • TypeScript: Has both appendKeys() AND appendPersistentKeys()
  • Python: Only has append_keys() currently

3. Multiple Ways to Set Persistent Data

# Currently, these both persist through clear_state():
logger = Logger(service="payment")  # Via _default_log_keys
logger.persistent_keys = {"env": "prod"}  # New proposed way

# This creates confusion about what persists and why

🔧 Technical Implementation Challenges

1. State Management Complexity

class Logger:
    def __init__(self, persistent_keys=None):
        self._default_log_keys = {"service": self.service, "sampling_rate": self.sampling_rate}
        self._persistent_keys = persistent_keys or {}  # NEW
        self._temporary_keys = {}  # NEW - need to track separately

2. Key Conflict Resolution

# What happens here?
logger.append_keys(environment="staging")      # Temporary
logger.persistent_keys = {"environment": "prod"}  # Persistent  
logger.info("test")  # Which environment value wins?

3. Clear State Behavior

def clear_state(self):
    # Need to preserve BOTH _default_log_keys AND _persistent_keys
    # But clear only temporary keys - complex logic needed

💡 Discovered Issues from Code Analysis

1. Formatter vs Logger Responsibility

  • Current: Logger delegates to formatter.clear_state()
  • Problem: Formatter doesn't know about Logger's persistent keys concept
  • Solution: Need coordination between Logger and Formatter

2. structure_logs() Method Overloading

  • Currently handles both initialization AND key appending
  • Adding persistent keys would make this method even more complex
  • Risk of breaking existing behavior

3. Thread Safety with Context Variables

The formatter uses ContextVar for thread-local keys:

def thread_safe_append_keys(self, **additional_keys) -> None:
    set_context_keys(**additional_keys)

Persistent keys would need similar thread-safe handling.

🎯 Recommendations

Option 1: Follow TypeScript API ExactlyRECOMMENDED

logger = Logger(
    service="payment",
    persistent_keys={"environment": "prod", "version": "1.0"}  # Constructor
)

# Runtime methods (matching TypeScript)
logger.append_persistent_keys(region="us-east-1")
logger.remove_persistent_keys(["version"])  
logger.append_keys(user_id="123")      # Temporary (existing)
logger.clear_state()                   # Clears only temporary keys

Pros:

  • ✅ Consistent with TypeScript version
  • ✅ Clear semantic distinction
  • ✅ Explicit method names reduce confusion

Option 2: Enhance Current API with Clear Naming

logger = Logger(service="payment", permanent_log_keys={"env": "prod"})

logger.append_temporary_keys(user_id="123")  # Rename existing method
logger.append_permanent_keys(version="1.0")   # New method  
logger.clear_temporary_keys()                 # Rename existing method

Pros:

  • ✅ Very clear semantic meaning
  • ✅ Backwards compatible (with deprecation)

Option 3: Single Method with Scope Parameter

logger.append_keys(user_id="123", scope="temporary")    # Default
logger.append_keys(env="prod", scope="persistent")  
logger.clear_keys(scope="temporary")  # Default
logger.clear_keys(scope="persistent")  
logger.clear_keys(scope="all")

Pros:

  • ✅ Single consistent API
  • ❌ More complex parameter handling

🚧 Implementation Strategy (Option 1)

Phase 1: Internal Refactoring

class Logger:
    def __init__(self, persistent_keys=None, **kwargs):
        self._default_log_keys = {"service": self.service, "sampling_rate": self.sampling_rate}
        self._persistent_keys = persistent_keys or {}
        self._all_persistent = {**self._default_log_keys, **self._persistent_keys}

    def clear_state(self):
        self.registered_formatter.clear_temporary_keys()  # NEW method
        self.structure_logs(**self._all_persistent)       # Restore all persistent

Phase 2: Add New Methods

def append_persistent_keys(self, **keys):
    self._persistent_keys.update(keys)
    self._all_persistent.update(keys)
    self.registered_formatter.update_persistent_keys(**keys)

def remove_persistent_keys(self, keys: List[str]):
    for key in keys:
        self._persistent_keys.pop(key, None)
        self._all_persistent.pop(key, None)  
    self.registered_formatter.remove_persistent_keys(keys)

Phase 3: Formatter Updates

class LambdaPowertoolsFormatter:
    def __init__(self, **kwargs):
        self._persistent_keys = {}
        self._temporary_keys = {}
        # Existing logic...

    def clear_temporary_keys(self):  # NEW
        self.log_format = dict.fromkeys(self.log_record_order)
        self.log_format.update(**self.keys_combined)  # Existing
        self.log_format.update(**self._persistent_keys)  # NEW

⚠️ Breaking Change Considerations

Backwards Compatibility Strategy:

  1. Constructor: persistent_keys=None (optional, no breaking change)
  2. Methods: Keep existing append_keys() and clear_state() working exactly as before
  3. Deprecation: Optionally deprecate in favor of explicit append_temporary_keys()

Migration Path:

# V2 (Current) - Still works
logger.append_keys(user_id="123")
logger.clear_state()

# V3 (New) - Recommended  
logger.append_keys(user_id="123")        # Temporary (unchanged)
logger.append_persistent_keys(env="prod") # Persistent (new)
logger.clear_state()                     # Clears only temporary (unchanged behavior)

🔍 Edge Cases to Test

  1. Key Conflicts: Same key set as both temporary and persistent
  2. Clear State Timing: Multiple invocations with Lambda context reuse
  3. Thread Safety: Concurrent access to persistent vs temporary keys
  4. Memory: Large persistent key dictionaries across invocations
  5. Serialization: Ensure persistent keys don't break JSON serialization

📊 Risk Assessment

Risk Probability Impact Mitigation
API Confusion HIGH MEDIUM Clear documentation + TypeScript consistency
Breaking Changes LOW HIGH Careful backwards compatibility
Performance LOW LOW Minimal overhead for key tracking
Memory Leaks MEDIUM MEDIUM Proper cleanup in Lambda context

🎯 Next Steps

  1. Decision: Choose API approach (recommend Option 1 for TypeScript consistency)
  2. Prototype: Implement minimal version for validation
  3. Testing: Extensive edge case testing
  4. Documentation: Clear examples showing temporary vs persistent distinction
  5. Community Feedback: Get input on API design before implementation

💭 Final Thoughts

Your concern about API confusion is 100% valid! The distinction between temporary and persistent keys isn't immediately obvious. Following the TypeScript approach with explicit method names (append_persistent_keys, remove_persistent_keys) would provide the clearest API while maintaining consistency across Powertools languages.

The technical implementation is definitely feasible, but requires careful coordination between Logger and Formatter classes to maintain backwards compatibility while adding the new persistent behavior.

Ready to dive deeper into any specific aspect! 🚀

dcabib avatar Aug 29 '25 17:08 dcabib

Wow, thank you for the incredibly detailed analysis and write-up, @dcabib!

For what it's worth, my 2 cents as a user is to have the different languages to be as similar as possible. The fewer amount of differences, the easier it is to use this library if users ever have to hop around between projects of different languages.

jth08527 avatar Sep 03 '25 17:09 jth08527

Hi both, feature parity and having different versions of Powertools for AWS being as similar as possible to each others is definitely something important to us.

For this specific item we're not yet ready to make a decision because we're working on some items that we'll share over the coming weeks and that might impact this area of the code.

For now I'll put this issue on hold, but we'll revisit it before end of the year.

dreamorosi avatar Sep 08 '25 11:09 dreamorosi