gloo icon indicating copy to clipboard operation
gloo copied to clipboard

Transform to replace a portion of a header value via regex

Open bdecoste opened this issue 1 year ago • 7 comments

Gloo Edge Product

Enterprise

Gloo Edge Version

v1.13.x

Is your feature request related to a problem? Please describe.

We need to be able to replace a portion of a request header (e.g. cookie) that is sent to the upstream

Describe the solution you'd like

Inja transform to replace a portion of a header value via regex

Describe alternatives you've considered

No response

Additional Context

No response

bdecoste avatar Sep 20 '23 18:09 bdecoste

https://github.com/solo-io/gloo-mesh-enterprise/issues/11794

bdecoste avatar Sep 20 '23 18:09 bdecoste

+1 - looking for an Inja callback to perform the same thing.

The callback format should be something like replace(textToOperateOn: string, regexToSearchFor: string, stringToReplaceWith: string) that replaces each substring of textToOperateOn that matches the regular expression regexToSearchFor with stringToReplaceWith.

marcobjorge avatar Nov 21 '23 13:11 marcobjorge

Customers asking for replace function to be available in inja

edubonifs avatar Nov 21 '23 14:11 edubonifs

Use case of the client:

At this moment, if you have hundreds of RouteTables and you have different headers reaching each matcher in every RouteTable, if you want to define a RateLimitConfig, you should:

Create RateLimitConfig specifying a a different header for each RouteTable or the same header but different values for each RouteTable.

This would grow a lot as we would need many different RateLimitConfigs for each RouteTable. In summary:

They want to start applying rate limits PER tenant (HTTP header) PER operation. There are two ways to do this: (1) using a distinct rate limit per operation (hundreds of RateLimitConfig with hundreds of distinct Route (s)) (2) using a reusable rate limit for all operations where the operation id is part of the rate limit descriptor (single Route per service, single RateLimitConfig per service)

Solution (1) has a relevant maintenance and development effort as they’ll need to create individual objects for each single operation. It also has a huge toll on the system as they’ll have hundreds of routes to apply each different rate limit to different operations Solution (2) is elegant and simple - they can have a single Route per service and a single RateLimitConfig per service. It requires less maintenance from their engineering teams and doesn’t have as much an impact on GE.

If they would be able to use replace function in inja, they would only need one RateLimitConfig for every RouteTable, as they could manipulate headers using regex.

edubonifs avatar Nov 21 '23 15:11 edubonifs

Sorry had made a duplicate internal issue. Formally assigning this oss issue as well.

nfuden avatar Jan 08 '24 14:01 nfuden

Design/Implementation Plan

Overview

  • The primary concern here is whether this functionality should be implemented as an Inja callback or if we are better off leveraging the transformation filter's existing extractor functionality to satisfy the RFE
    • Gloo Edge Inja documentation: https://docs.solo.io/gloo-edge/1.7.23/guides/traffic_management/request_processing/transformations/#templating-language
    • Gloo Edge extractor documentation: https://docs.solo.io/gloo-edge/1.7.23/guides/traffic_management/request_processing/transformations/#extractors
  • I propose that we proceed with an Inja callback, as these are powerful and require a less invasive addition to the codebase to implement
  • My current plan also introduces a new transformation template field called regexes, which is similar to extractors.
    • I considered the possibility of re-using extractors for this purpose, but it did not seem feasible to me because of how they are currently handled.

Context

  • I put up a data-plane prototype PR that implements this functionality as an inja callback
    • The prototype was straightforward to implement and is functional
    • However, this prototype compiles a regex on the request path, which is not acceptable. If we are to proceed with an implementation like this, we need to find a strategy to compile regexes at configuration time
    • If we do not go ahead with this approach, it should be relatively straightforward to repurpose the tests introduced with this PR for use with another approach
  • In May 2023, Dimitri merged this PRwhich implements a new Inja transformation (replace_with_random) for a similar but slightly different use case
    • This callback will find and replace instances of a given pattern. It is important to note that this does not make use of regexes -- it does a find and replace for a static pattern, and does not require any sort of regex compilation

Regex compilation

  • The trickiest part of this work is finding a comfortable way to compile regexes at configuration time, as regex compilation is time consuming and should never be done on a performance-critical path
  • Thankfully, Jacob added support for parsing templates at init time with this PR https://github.com/solo-io/envoy-gloo/pull/248
    • We will need to update the InjaTransformer constructor to parse provided regexes and pass them into thread local transformer instances
  • Extractors compile regexes when they are constructed on the main thread https://github.com/solo-io/envoy-gloo/blob/main/source/extensions/filters/http/transformation/inja_transformer.cc#L59 , so we can borrow this approach

User facing configuration

  • In order to configure this functionality, a user will need to:
  • Configure a regex (this is a new API field) at the transformation-template level with a literal pattern
    • the pattern specified here will be compiled at config-time
  • Use the new regex_replace inja callback, takes in three arguments:
    • text_to_replace -- this is the text which the regex will run against
    • regex -- this is the name of the regex specified earlier in the transformationTemplate. At request-time, Envoy will evaluate text_to_replace against the pre-compiled regex whose name matches the value specified here
    • replacement_text -- any matches for regex in text_to_replace will be replaced with replacement_text
   options:
     transformations:
       requestTransformation:
         transformationTemplate:
	        regex:
	          foo:
	            pattern: '.*'
           headers:
             # By updating the :path pseudo-header, we update the request URI
             "bar":
               text: '{{ regex_replace(header("foo"), foo, baz) }}'

Data-plane implementation plan: add new regexes message to transformation filter proto

  1. Introduce a new regexes message to the transformation filter proto, similar to the extraction message
  • This message would look something like:
	message Regex {
		string pattern = 1;
		# string replacement_text = 2
	}
  • and would live in the TransformationTemplate next to extrators
    • Note that the message could optionally contain some replacement text, so users don't have to specify the replacement inline if they choose not to
  map<string, Extraction> extractors = 2;
+ map<string, Regex> regexes = 13;
  1. Add a new private field, regexes_ to the InjaTransformer (which is constructed at config time).
  2. Create a new class, Regex. This class has one field, pattern, which stores a compiled std::regex. On construction, this class will take in a std::string, and attempt to compile a pattern from it
  3. Populate the regexes_ field with the configuration data in the InjaTransformer constructor, similarly to how we currently do for extractors_
  4. Update InjaTransformer::transform (which is called on the request path) to pass pre-compiled regexes into the thread-level typed_tls_data struct, similarly to what we currently do with extractions
  5. Add a replace_callback inja callback, as is currently done in my prototype.
    1. This will be updated so that instead of compiling the regexes upon being called, it will retrieve the regexes that have already been passed into typed_tls_data

Testing Requirements

  • My prototype already introduces some data-plane unit and integration tests
  • These cover test cases such as:
    • ReplaceHappyPath: This test checks if the replace function correctly replaces a specified string ("foo") with another string ("bar") in the body of the request.
    • ReplaceNoMatch: This test verifies the behavior of the replace function when there is no match found for the string to be replaced ("foo") in the body of the request.
    • ReplaceInvalidRegex: This test checks how the replace function handles an invalid regular expression (in this case, an unbalanced bracket).
    • ReplaceMultipleInstances: This test ensures that the replace function correctly replaces all instances of a specified string ("foo") with another string ("bar") in the body of the request.
  • We will need to flesh out the unit tests with some additional implementation-specific checks, but the existing tests are a good start and should be viable in the finished PR
  • Additionally, we will need to add control-plane e2e tests against this functionality
    • I expect this to go relatively quickly, as we already have Inja e2e tests in gloo edge, so it should be relatively straightforward to follow the pattern

Proposed Timeline

  • Implementation:
    • Data plane implementation: 3 days
    • Data plane tests: .5 days
    • Data plane review process: 3 days
    • Control plane implementation (supporting new API fields): 1 day
    • Control plane e2e tests: .5 days
    • Control-plane review process 1.5 days
  • With this in mind, I expect this work to take around 10 days to implement, although I will update this issue with my progress

ben-taussig-solo avatar Jan 17 '24 20:01 ben-taussig-solo

Update: I have a nearly ready-for-review PR up here: https://github.com/solo-io/envoy-gloo/pull/301 As discussed in that PR's description, there are a couple of things I need to evaluate before this is ready for review, namely how the new functionality interacts with:

  • mergeExtractorsToBody
  • the extraction inja callback (which lets a user reference an extraction by name)

ben-taussig-solo avatar Jan 25 '24 15:01 ben-taussig-solo

Merged into Gloo Edge Enterprise:

As of yesterday. Please note that these versions have not yet been released.

ben-taussig-solo avatar Mar 19 '24 14:03 ben-taussig-solo