envoy Enhance ext_proc filter to support MXN streaming

This PR is for issue: https://github.com/envoyproxy/envoy/issues/32090. One of the use case is, like compression by the external processing.

This is to let the side stream server be able to buffer M request body chunks from Envoy first, processing them, then send N chunks back to Envoy in the STREAMED mode.

The ext_proc MXN streaming works this way:

Enable the MXN streaming by configuring the body mode to be MXN in the ext_proc filter config.
Config the trailer mode to be SEND in the ext_proc filter config.
Config the header mode to be SEND in the ext_proc filter config (this is optional).

With above config, Envoy will send header(if 3 is configured ), body and trailer(if present) to the side stream server as they arrival. The server can buffer the header, the entire or partial of the body (M chunks) and the trailer(if present) then streaming the mutated header, mutated body(may need to split into N chunks), the mutated trailer(if present), back to Envoy.

Jun 26 '24 18:06 yanjunxiang-google

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/). envoyproxy/api-shepherds assignee is @mattklein123 CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/34942 was opened by yanjunxiang-google.

see: more, trace.

Jun 26 '24 18:06 repokitteh-read-only[bot]

/assign @gbrail @htuch @jmarantz @tyxia @yanavlasov

Jul 11 '24 02:07 yanjunxiang-google

@gbrail cannot be assigned to this issue.

:cat:

Caused by: a https://github.com/envoyproxy/envoy/pull/34942#issuecomment-2221851292 was created by @yanjunxiang-google.

see: more, trace.

Jul 11 '24 02:07 repokitteh-read-only[bot]

/assign @tyxia

As codeowner for first pass.

Jul 16 '24 13:07 KBaichoo

/assign @tyxia

As codeowner for first pass.

The design is under internal review and discussion.

My opinion here is that we should remove the ext_proc internal buffering here, which is essentially ask in https://github.com/envoyproxy/envoy/issues/32090. But it is also good to hear other reviewers' opinions. (i.e., not a gating factor if other reviewers approve it)

Adding wait here to avoid daily ping on maintainer on-caller.

/wait

Jul 25 '24 15:07 tyxia

I agree; I think internal buffering or MxN complicates the model and how to reason about it.

Jul 25 '24 15:07 jmarantz

IMHO,, I am not sure if no internal buffer, how to support the case if the side stream server ask Envoy to send the original data as it is for certain sequence of data during the streaming. However, it is okay for me to put this on wait.

Jul 25 '24 17:07 yanjunxiang-google

I am not sure if no internal buffer, how to support the case if the side stream server ask Envoy to send the original data

IMHO sidestream server can just send back the original data, as the original data has been forwarded sidestream server.

The injected data is totally under sidestream server's control : add, remove, compress, or keep (even just keep certain part) , etc.

Jul 25 '24 17:07 tyxia

I am not sure if no internal buffer, how to support the case if the side stream server ask Envoy to send the original data

IMHO sidestream server can just send back the original data, as the original data has been forwarded sidestream server.

The injected data is totally under sidestream server's control : add, remove, compress, or keep (even just keep certain part) , etc.

Asking the side stream server to send back the original data has network bandwidth and CPU processing implications. Is that a concern? IIUC, to avoid that is one of the reason Envoy ext_proc filter has this buffer in the first place.

Jul 25 '24 17:07 yanjunxiang-google

Asking the side stream server to send back the original data has network bandwidth and CPU processing implications. Is that a concern? IIUC, to avoid that is one of the reason Envoy ext_proc filter has this buffer in the first place.

To your initial point, i am wondering if we are in (or a bit closer to) agreement that there is no hard blocker from functionality perspective to remove internal buffer? Besides, from design perspective, internal buffer is probably hard to be reasoned about, as pointed out by Josh

Regarding performance, I agree it could help with no mutation case (though this case alone could be wrong use point3 below) but more broadly:

First, no internal buffering solves the one major perf issue that large amount of data is buffered in Envoy. This issue will be amplified in this mode/use case when sidestream server wants to buffer data, delays response.
Second, transmission overhead depends on customer use case. If they compress the data, the overhead is already smaller than existing request path. If they want to amplify data, internal buffering also will not help here
Third, it probably wrong use of this mode for no mutation. They can use observability mode for such purpose. Though it is still supported and up to customer's choice. i.e., not the blocking issue of removing the internal buffer.

Jul 25 '24 18:07 tyxia

@tyxia , thanks for the detail comments!

For 3), I think there are use cases like security validation, which ext_proc server will examine the data content. If the data contains bad content, it sends back an immediate response to tell Envoy to send back a local reply and close the downstream client connection. If the data is legit, it sends back an empty body response to tell Envoy to send the original data as it is. So, with this internal buffer, it has network bandwidth and performance gain in such case. IIUC this case can not use observability mode even no data mutation.

For 1) and 2), the ext_proc filter chunk_queue only buffers the downstream data to a certain point(until watermark is raised). I am not aware this has any functional problems today with synchronous mode, but please correct me if I am wrong.

Jul 26 '24 13:07 yanjunxiang-google

@yanjunxiang-google Thanks for reply!

I think your security validation case is already supported by existing mode and doesn't effectively use this feature because it doesn't mutate the data at all. EDIT: Actually internal buffer also disables your security validation case when it is large data, due to watermark by internal buffer.

I hope we can jump out of one particular use case and bring back to broad view and motivation of this feature: which I understand it as arbitrary mutation. This means sideStream server injected data is independent of mainstream data. And still buffering mainstream data in Envoy is redundant.

But again, i will not be the single blocking factor here if other reviewers approve it :) Thus, It is probably good to convince other reviewers.

BTW, having a meeting with all reviewers probably will be more productive here.

Jul 26 '24 15:07 tyxia

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

Sep 04 '24 04:09 github-actions[bot]

hello, @yanjunxiang-google any updates on this PR, thanks?

Sep 10 '24 05:09 xr

hello, @yanjunxiang-google any updates on this PR, thanks? Thanks for checking. We are changing the design a little bit for this feature. Will be updating the PR soon.

Oct 05 '24 14:10 yanjunxiang-google

PR needs main merge - CI is breaking due to too much change

Oct 10 '24 10:10 phlax

/assign @yanavlasov @tyxia @htuch

Oct 17 '24 04:10 yanjunxiang-google

/assign @jmarantz

Oct 17 '24 04:10 yanjunxiang-google

Adding @adisuissa For API changes

Oct 17 '24 04:10 yanjunxiang-google

KInd Ping!

Oct 22 '24 14:10 yanjunxiang-google

Kind ping!

Oct 25 '24 19:10 yanjunxiang-google

/retest

Oct 31 '24 15:10 yanjunxiang-google

Kind Ping!

Oct 31 '24 17:10 yanjunxiang-google

Will wait for other approvals and then submit.

/wait-any

Nov 01 '24 13:11 yanavlasov

/wait

Nov 05 '24 14:11 jmarantz

/wait (for #36999)

Nov 06 '24 13:11 jmarantz

Kind Ping!

Nov 07 '24 13:11 yanjunxiang-google

@adisuissa I did an upstream merge. It needs your API approval again. Thanks!

Nov 08 '24 17:11 yanjunxiang-google

LGTM, as a WIP/good start.

I think those open comments above need to be addressed (and some more tests/loadtest) to complete this feature.

Nov 08 '24 20:11 tyxia

envoy envoy copied to clipboard

Enhance ext_proc filter to support MXN streaming

envoy
envoy copied to clipboard