envoy icon indicating copy to clipboard operation
envoy copied to clipboard

Support Body Transformations

Open arkodg opened this issue 1 year ago • 14 comments

Title: Support Body Transformations

Description:. Envoy supports manipulating/transforming headers, would be great to also support transforming the request and response body to be able to

  • sanitize fields
  • support API conversions

Relevant Links:

Adding a list of other proxy implementations that support this

  • Amazon API Gateway https://docs.aws.amazon.com/apigateway/latest/developerguide/rest-api-data-transformations.html
  • Tyk https://tyk.io/docs/transform-traffic/request-body/
  • KrakenD https://www.krakend.io/docs/enterprise/backends/body-generator/
  • Gloo Edge https://docs.solo.io/gloo-edge/latest/guides/traffic_management/request_processing/transformations/
  • Kong https://docs.konghq.com/hub/kong-inc/request-transformer/
  • Apache Apisix https://apisix.apache.org/docs/apisix/plugins/body-transformer/
  • Apigee https://cloud.google.com/apigee/docs/api-platform/develop/shaping-and-converting-messages

arkodg avatar Aug 21 '24 21:08 arkodg

would be great if this feature/filter can also support copying/setting fields from the body into the header, allowing routing based on request body which is a AI LLM Gateway use case, more in this doc cc @robscott

arkodg avatar Aug 23 '24 00:08 arkodg

Although it is not encouraged that to mutate request body because it will need to buffer whole body and break the streamlined processing. But I also admit that there are lots of related requirements.

So, SGTM. And I can help to review the design and to sponsor this new extension if someone wants to take this. (note: a design proposal is necessary first).

wbpcode avatar Aug 23 '24 09:08 wbpcode

@wbpcode I will take a shot at this!

mathetake avatar Aug 23 '24 15:08 mathetake

/assign

mathetake avatar Aug 23 '24 15:08 mathetake

Although it is not encouraged that to mutate request body because it will need to buffer whole body and break the streamlined processing. But I also admit that there are lots of related requirements.

So, SGTM. And I can help to review the design and to sponsor this new extension if someone wants to take this. (note: a design proposal is necessary first).

I think the technical challenge is to make mutations streaming. It's feasible as long as the body is structured. The buffering approach forces large connection buffers in multiplexed protocols, which is not scaleable for multi-tenant gateways.

kyessenov avatar Aug 23 '24 16:08 kyessenov

Although it is not encouraged that to mutate request body because it will need to buffer whole body and break the streamlined processing. But I also admit that there are lots of related requirements.

So, SGTM. And I can help to review the design and to sponsor this new extension if someone wants to take this. (note: a design proposal is necessary first).

I think the technical challenge is to make mutations streaming. It's feasible as long as the body is structured. The buffering approach forces large connection buffers in multiplexed protocols, which is not scaleable for multi-tenant gateways.

According to my exp, in the scenarios where this feature is required, the body basically is a JSON. It's almost impossible to make mutations streaming for that.

At least for now, I don't know how to make a general solution for long live stream which has unlimited body length. So, I am inclined to ignore them first.

wbpcode avatar Aug 23 '24 16:08 wbpcode

+1 to @wbpcode's suggestion of keeping streaming, out of scope

arkodg avatar Aug 23 '24 17:08 arkodg

+1 to @wbpcode's suggestion of keeping streaming, out of scope

Yes, I didn't mean to require it. But we should be explicit about the limitations of a buffering approach (e.g. connection buffers must be at least the number of streams per connections \times max buffered bytes). E.g. 1MB JSONs with 100 H2 streams require up to 100MB connection buffers.

kyessenov avatar Aug 23 '24 17:08 kyessenov

https://docs.google.com/document/d/1odMAistdE8OrJHqKvV4ou9vsZxB1Wepy0cxAyBJ-Bes/edit so @arkodg and I quickly wrote a simple proposal. @wbpcode could you take a look when you get a chance? Thanks in advance!

mathetake avatar Oct 03 '24 17:10 mathetake

FWIW, ext_proc filter supports body mutation, as well as header and trailer

tyxia avatar Oct 04 '24 18:10 tyxia

CC @TAOXUY - I think you also need some body transformations and extractions.

Back to my early buffering point - I think it would be nice to have separate buffering and streaming modes. You don't need to buffer to sanitize JSON fields, and the only meaningful benefit of this over ext_proc is that it must perform significantly better (e.g. hundred of micros P50). overhead, which is hard to do when you fully buffer.

kyessenov avatar Oct 04 '24 18:10 kyessenov

Yeah, our case is extracting protobuf and our needs are satisfied by these 2 filters https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/grpc_field_extraction_filter and https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/proto_message_extraction_filter

TAOXUY avatar Oct 04 '24 19:10 TAOXUY

I'm wondering what the negative effects of being able to do message transformation with a native filter are?

The positives I anticipate form this is ease of configuration for users, compared to developing their own ext proc, as well as reduced complexity of network calls invoking external processes.

missBerg avatar Oct 07 '24 17:10 missBerg

@wbpcode could you take a look at the doc we shared above when you get a change? tia!

mathetake avatar Oct 16 '24 16:10 mathetake

I get some free bandwidth and will give this a try to support the substitution formatter based body transformation. At the initial version, it will support following feature:

  1. transform request and response body based on the https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/core/v3/substitution_format_string.proto#envoy-v3-api-msg-config-core-v3-jsonformatoptions.
  2. request/response headers mutations based on the body content.
  3. filter state mutations based on the body content. (to extract data from body to filter state for logging/stats/other filters but not expose to clients)
  4. Only support JSON request and JSON response. (will support stream/event in the future.)

wbpcode avatar Jul 09 '25 03:07 wbpcode