powertools-lambda-typescript icon indicating copy to clipboard operation
powertools-lambda-typescript copied to clipboard

RFC: Auto Mask Secrets

Open jasonwadsworth opened this issue 2 years ago • 17 comments

Description of the feature request

Automatically mask secret or password values in extra data.

Problem statement

A best practice is to never log secrets or passwords. There are times when I have an object that might have one of these things in it. I'd like to be able to pass in the object without needing to pull out the secrets manually.

Summary of the feature

If turned on, any extra data passed into the logger would be looked at and secrets/passwords would be masked. Consider the following data.

{
  username: "my-username",
  password: "my-password"
}

The code would see a key that contains "password" (this could be a configurable regex with a default) and would replace it with something like "*****", resulting in the following extra data being logged.

{
  username: "mu-username",
  password: "*****"
}

No secrets in logs

Code examples

const logger = new Logger();

const objectWithASecret = {
  username: "my-username",
  password: "my-password"
};

logger.debug('Logging in', { objectWithASecret });  

Benefits for you and the wider AWS community

Safer logging.

Describe alternatives you've considered

Creating a function that does this for me. The downside to this is twofold. One, I have to include the function call everywhere, so if I forget it I risk writing something I shouldn't to the logs. Two, the devex is slightly worse because I have to do something like { extraData: cleanExtraData(extraData) } instead of just { extraData }

Additional context

Related issues, RFCs

jasonwadsworth avatar Apr 05 '22 16:04 jasonwadsworth

I agree that credentials should not be logged as per ASVS 7.1.1. There are other types of data that should not be logged as well based on who owns the data or where it is stored.

How would someone automate this? a) Have a list of keys that should be hidden? English speaking developers might use password or token, but what about in other languages? b) Detect secret like patterns? Secrets could look like other data? c) Mix of both?

What are the performance implications need to be considered compared to other approaches?

In @middy/input-output-logger we opted for allowing a developer to list what paths to omit using omitPaths. In my experience secrets/PII is well known during development and not overwhelming to manually list per handler.

I believe GitHub Actions has this functionality built in, they might have a package that could be used. Git secrets scanner may also be a source of ideas for this (https://github.com/awslabs/git-secrets).

willfarrell avatar Apr 05 '22 19:04 willfarrell

Hi @jasonwadsworth thanks a lot for the feature request.

We have had a similar feature on the radar for all the runtimes, although it seems this one has a more reduced scope. You can find more details in this other issue awslabs/aws-lambda-powertools-python#1173

I'm 100% on board and in favor of the concept of not logging secrets and leaving the responsibility to the developer seems like a sensible choice.

At the moment though, we want to make sure to make the base feature set of the core utilities as stable and mature as possible before considering new features. After going GA and having collected enough feedback from customers we'll definitely consider this type of request.

dreamorosi avatar Apr 05 '22 21:04 dreamorosi

I searched for something like this and didn't find it. Thanks for the link to the roadmap. Also completely understand on timing. This was just something that came up today so I thought I'd add it.

This is one case where I'd like to make it easier for devs to not make mistakes, so it would be nice to not leave it to the devs. Having it at the configuration point of creating the logger would be great, since that can be done in a single place.

jasonwadsworth avatar Apr 05 '22 23:04 jasonwadsworth

I like this feature. I suggest that the user can define which keys to be filtered instead of having defaults.

damianesteban avatar Jul 31 '22 18:07 damianesteban

+1

RajivSah avatar Aug 30 '22 01:08 RajivSah

Hey guys, I'm interested in contributing. I was thinking about allowing for a maskedValues array on logger creation and then under createAndPopulateLogItem add a function that cross checks key value pairs with the array and masks the values that correspond. Would that work? Should be non-intrusive and self contained.

wehnerjp avatar Oct 17 '23 20:10 wehnerjp

@sthulb

wehnerjp avatar Oct 19 '23 19:10 wehnerjp

Hi @wehnerjp, thank you for the offering to contribute to the feature and project and sorry for the slight delay.

I saw your message yesterday and I wanted to have the time to formulate a proper response.

Adding a Data Masking utility is something that we are strongly considering doing, however we'd like to do so with a bigger scope than just obfuscating, which is what is currently being implemented in the Python version of Powertools.

Obfuscation in itself is not necessarily complex, however after talking with other customers and security experts we have found that the next natural step after obfuscation would be to allow for recovering the sensitive data that was masked. To do this however we can't simply find & replace all the sensitive values with ***. For the values to be rehydrated, we need to use a deterministic approach that can be reversed (aka encryption) - see more details in aws-powertools/powertools-lambda-python#1173.

Since this issue was created, work on the Python side of Powertools has continued and the feature has gone through a RFC (aws-powertools/powertools-lambda-python#1858), which eventually coalesced around an implementation that uses KMS and an Encryption SDK, to encrypt the sensitive values.

Later, the team has worked on an initial implementation (aws-powertools/powertools-lambda-python#2197) using the specification agreed in the RFC, which was merged 3 weeks ago.

As stated in our public roadmap, one of our main focuses as a project is to eventually reach feature parity with Powertools for AWS Lambda (Python), so an eventual implementation of a Data Masking utility in this project will look more or less the same to the one they just merged in terms of feature set and overall scope.

At the moment we are focusing on getting version 2 out (#1714) and two new utilities: Parser (#1334) & Validation (#508), however I'm more than open to start discussing and refining this RFC if you or anyone else in the thread/community is interested in contributing it. The goal would be to study the RFC and implementation in Python and come up with a similar RFC here that incorporates the same scope & features, but following idioms and conventions from this project and the wider ecosystem (Node.js / TS).

If instead you're only interested in obfuscating values and are not interested in encrypting them and/or being able to retrieve them, then you'll be able to do this with the existing feature set of Logger real soon.

As mentioned, we are currently working on our next major version which will make the Logger and its LogFormatter more customizable by allowing you to fully customize the log output.

At the moment the LogFormatter in v1 only gives you access to the standard keys managed by the Logger (i.e. level, message, timestamp, etc.), however this is going to change soon.

With the new version, you'll be able to create a custom LogFormatter like the one below, and obfuscate any value you want:

import { Logger, LogFormatter, LogItem } from "@aws-lambda-powertools/logger";
import type {
  LogAttributes,
  UnformattedAttributes,
} from "@aws-lambda-powertools/logger/types";

class MyFormatter extends LogFormatter {
  #keysToObfuscate: string[];

  public constructor({ keysToObfuscate }: { keysToObfuscate: string[] }) {
    super();
    this.#keysToObfuscate =
      keysToObfuscate && keysToObfuscate.length > 0 ? keysToObfuscate : [];
  }

  public formatAttributes(
    attributes: UnformattedAttributes,
    additionalLogAttributes: LogAttributes
  ): LogItem {
    return new LogItem({
      attributes: {
        ...this.obfuscate(attributes),
        ...this.obfuscate(additionalLogAttributes),
        timestamp: this.formatTimestamp(attributes.timestamp),
      },
    });
  }

  /**
   * Obfuscate the keys in the object that match the keysToObfuscate array.
   * If the object is nested, recursively obfuscate the keys.
   * 
   * @note this is a sample & naïve implementation - it can probably be optimized
   */
  obfuscate(object: Record<string, unknown>) {
    const obfuscatedObject: Record<string, unknown> = {};
    Object.keys(object).forEach((key) => {
      if (this.#keysToObfuscate.includes(key)) {
        obfuscatedObject[key] = "****";
      } else if (typeof object[key] === "object") {
        obfuscatedObject[key] = this.obfuscate(
          object[key] as Record<string, unknown>
        );
      } else {
        obfuscatedObject[key] = object[key];
      }
    });
    return obfuscatedObject;
  }
}

Which you can then use in your Logger like this:

const logger = new Logger({
  logLevel: "DEBUG",
  logFormatter: new MyFormatter({
    keysToObfuscate: ["my_secret_field", "my_other_secret_field"],
  }),
});

I have tested the above on a build made from the code currently in the feat/v2 branch and the log output generated looks like this:

{
    "logLevel": "DEBUG",
    "timestamp": "2023-10-19T21:13:04.584Z",
    "message": "this log might contain secrets",
    "serviceName": "service_undefined",
    "my_secret_field": "****",
    "some_other_field": "some other value",
    "a_nested_object": {
        "nested_field": "nested value",
        "my_secret_field": "****",
        "another_nested_object": {
            "my_other_secret_field": "****"
        }
    }
}

We are planning on releasing the first alpha version (which will have this feature) early next week.

dreamorosi avatar Oct 19 '23 21:10 dreamorosi

Wow, that's fantastic. Really excited about the v2 updates. Thank you so much for your thoughtful response. Great info and very helpful. I ended up just creating a wrapper class around Logger that overrode the methods with a little search and replace subroutine similar to the one you illustrated above before passing it down to the super class invocation. I'll look into the python power tools functionality you're talking about. If it seems in my wheelhouse I'd be happy to contribute.

wehnerjp avatar Oct 20 '23 22:10 wehnerjp

Not sure if this is the right place to continue the conversation, but I got a chance to look into the python power tools implementation. It looks like the data masking is a separate utility from the logger rather than integrated into it. Do we want to just provide an encryption/decryption utility that can be imported, but has to be invoked separately from the Logger with the TS implementation as well?

wehnerjp avatar Oct 30 '23 17:10 wehnerjp

Hi @wehnerjp, yes this is the right place.

Ideally yes, we want to match the implementation and features in the Python version so it'd be a separate utility. This way customers can use it regardless of logging.

With that said, I don't think a deeper integration between the two is out of the question after we have released the first version of the new utility. In the meantime, customers can create a custom log formatter and apply the masking.

dreamorosi avatar Oct 30 '23 17:10 dreamorosi

Cool, I'll plan on just using this RFC then

wehnerjp avatar Oct 30 '23 18:10 wehnerjp

Hi guys, I've been chatting with Andrea and I wanted to take the next step to move forward on implementing this RFC in Typescript

Design proposal (Request for comments)

1. Summary

The goal of this document is to propose the scope and design of a Data Masking utility for Powertools for TypeScript. The basis for the implementation of this utility stems from the Python Powertools repo. We will use the current Python implementation as a starting point, and focus mainly on the differences in Typescript implementation.

As this is a step towards parity with the Python utility it is necessary to familiarize yourself with their implementation of Data Masking. The RFC for the Python Data Masking utility and the most substantative feature commit can provide helpful context if you lack prior knowledge.

2. Motivation

We want to create a utility, that offers developers a straightforward way to enhance security within their serverless applications.

Data Masking is an important capability that will ease the out of the box adoption of powertools for customers hoping to leverage our provided utilities in a comprehensive and secure way.

Prior to this, a wrapper class was necessary for consumers who wanted to scrub sensitive data. While this is a viable solution, it is not ideal as it requires additional code and is not as intuitive as a native utility (and potentially as a decorator for other native utilities like logger).

3. Utility interface

With this utility, customers only need to initialize a data masking instance and invoke it providing 3 parameters:

  • A value or object,
  • A list of fields to act on
  • Action to take (ex. encrypt, decrypt, mask)

Initially, there is only two instances of the data masking utility we'll aim to support:

  1. Data masking via decorator (at handler or function level). Can mask incoming event data and outgoing return data.

    • I want to note here that we will be using stage 2 decorators because stage 3 aren't supported by esbuild yet. You can find more information on this here. Stage 2 Documentation can be found here.
  2. Data masking via direct method invocation. (at any point within the handler function)

3.0.1. Mock Implementation of the Data Masking Class dataMaskFilter method:

export class DataMasking {

/**
 * Decorator function for applying data masking to a class or method.
 * It wraps the original method with data masking logic based on the provided options.
 *
 * @param options - Configuration options for data masking.
 * @returns A decorator function to be applied to a class or method.
 */
public dataMaskFilter = function (
  options: DataMaskingFunctionOptions<Parameters<AnyFunction>>
): (
  target: unknown,
  propertyKey: string,
  descriptor: PropertyDescriptor
) => PropertyDescriptor {
  return function (
    _target: unknown,
    _propertyKey: string,
    descriptor: PropertyDescriptor
  ) {
    const original = descriptor.value;
    // Wrap the original method with data masking logic
    descriptor.value = function (...args: any[]) {
      // Apply data masking to the arguments
      let maskedArgs = this.filterData(options, ...args);

      // Call the original method with masked arguments
      const result = original.call(this, ...maskedArgs);
      return result;
    };

    return descriptor;
  };
};


public mask(){
  /*...Masking Logic Here...*/
}
}

3.1.1. Data Masking handler (via Decorator)

In this section, we demonstrate the integration of the Data Masking utility into an AWS Lambda function handler using a decorator. Using decorators offers several benefits to end users, primarily enhancing code modularity, maintainability, and reducing boilerplate. The decorator encapsulates the cross-cutting concern of data masking, by abstracting away implementation details. This allows end users to focus on core functionality and maintain standardized approach to data masking.

class Lambda implements LambdaInterface {
  /**
   * Handler method for AWS Lambda functions, decorated with data masking.
   * It demonstrates the usage of the @masker.dataMaskFilter decorator to apply data masking to specific fields in the input data.
   *
   * @param _event - The input event object.
   * @param _context - The execution context.
   * @returns A Promise<void> representing the completion of the function.
   */
  @masker.dataMaskFilter({
    fieldsToMask: ['password', 'ssn'],
    defaultAction: 'mask',
  })
  public async handler(_event: any, _context: any): Promise<void> {
    // Original function logic here...
  }
}

3.2.1. Data Masking Direct Invocation

Here end users can reference our instance and manipulate the data within the handler directly. This approach offers flexibility for runtime customization and is particularly useful when the exact point of data masking application needs to be determined dynamically within the handler function. Users can easily store the output to other variables (like environment variables), or use some part of the output in a further AWS resource invocation. This allows end users to tailor their utilization of the package at any point in the runtime to a specific usecase.

// Create an instance of the DataMasking utility
const masker = new DataMasking();

class Lambda implements LambdaInterface {
  /**
   * Handler method for AWS Lambda functions with direct data masking invocation.
   *
   * @param _event - The input event object.
   * @param _context - The execution context.
   * @returns A Promise<void> representing the completion of the function.
   */
  public async handler(_event: any, _context: any): Promise<void> {
    // Original function logic here...

    // Sample data containing sensitive information
    const data = {
      project: 'powertools',
      sensitive: 'xxxxxxxxxx',
    };

    // Apply data masking to specific fields
    const masked = masker.mask(data, ['sensitive']);

    // Return the masked data or perform additional logic as needed
    return masked;
  }
}

4. Additional Features

These features may not be released as part of the first release, but should be considered for implementation soon thereafter.

1. Integration with other utilities

In the future, it's worth considering using the decorators implementation to allow one instantiated class instance (with a list of masked fields) to activate masking via an optional passed parameter in method props across different powertools utilities. For example:

import { LambdaInterface } from '@aws-lambda-powertools/interfaces';
import { Logger } from '@aws-lambda-powertools/logger';
import { DataMasking } from '@aws-lambda-powertools/data-masking';


class MyLambda implements LambdaInterface {
  public async handler(event: any, context: any) {
    /*...Function logic here...*/
    const log = new Logger({
    logFormatter: new MyLogFormatter(),
    serviceName: "mx-docs",
    // Anything passed into the info, warn, error methods would be parsed for fields and masked
    dataMasking: {
      fieldsToMask: ['password', 'ssn'],
      defaultAction: 'mask',
    },
  });
  }
}

This integration would streamline end users ability to leverage data masking in a variety of contexts within our powertools suite. Rather than a mandatory two step process for native powertools classes (either a wrapper class solution or executing a masking operation inline before passing data to another powertools utility), we could allow for a single step process where masking could be activated via one additonal param in the method invocation. This would mirror our users ability to leverage the decorator functionality in their own class methods.

2. Alternate ways to set fields to mask

We could potentially pull from a set process.env variable for a list of data masking fields too.

3. Add middy implementation

Similar to Logger

4. Design time Data Masking Decorator

Similar to 'idempotent' Decorator this will allow users to leverage the decorator to mask data in their own class methods. It won't require external instantiation

export const masked = function (
  options: DataMaskingFunctionOptions<Parameters<AnyFunction>>
): (
  target: unknown,
  propertyKey: string,
  descriptor: PropertyDescriptor
) => PropertyDescriptor {

  return function (
    _target: unknown,
    _propertyKey: string,
    descriptor: PropertyDescriptor
  ) {
    const childFunction = descriptor.value;
    descriptor.value = maskData(childFunction, options);

    return descriptor;
  };
};

5. Opened discussions and decisions

This is a list of important decision points from maintainer discussion. Subject to changes from comments from community.

Just a note, this data masking utility can be leveraged for logging in tandem with Cloudwatch Logs sensitive data protection. End users may find a combination more suitable than either individually and may not have awareness of one or both options.

1. References

Orignal RFC from from Python version https://github.com/aws-powertools/powertools-lambda-python/issues/1858

wehnerjp avatar Nov 13 '23 22:11 wehnerjp