envoy icon indicating copy to clipboard operation
envoy copied to clipboard

admin: add streaming variant of the admin API

Open jmarantz opened this issue 1 year ago • 7 comments

Commit Message: In https://github.com/envoyproxy/envoy/pull/19693 a streaming stats implementation was introduced, dramatically reducing the amount of time and memory it takes to generate an admin http response. /config_dump (#32054) and /clusters_ (#32054) can also be improved in this way.

However, in some environments we do not want to expose an HTTP port for admin requests, and we must use the C++ API. However that API is not streaming: it buffers the entire content. This PR adds a new streaming API.

Mechanically this new functionality was easy to add as an externally callble C++ API as the Admin class already supported this model on behalf of /stats.

However there's a fair amount of complexity managing potential races between active streaming requests, server exit, and explicit cancellation of an in-progress request. So much of the effort and complexity in this PR is due to making that thread-safe and testing it.

Additional Description:

Note to reviewers: if this PR is too big it can be broken into 2. The significant additions to main_common.cc, main_common.h, and main_common_test.cc need to stay together in one PR, the rest of the changes are non-functional refactors which are needed for the larger changes to work; they could be reviewed and merged first.

Risk Level: medium -- this is a new API but it does refactor some existing functionality. Using the new functionality will add some risk also as there is some careful thought required to believe we have protected against all possibilities of shutdown/cancel/admin-request races. Testing: //test/..., and lots of tsan repeated tests for the new streaming tests. Lots of iteration to ensure every new line of main_common.cc is hit by coverage tests. Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Fixes: #31755

jmarantz avatar Feb 13 '24 03:02 jmarantz

As a reminder, PRs marked as draft will not be automatically assigned reviewers, or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/32346 was opened by jmarantz.

see: more, trace.

CC @envoyproxy/coverage-shephards: FYI only for changes made to (test/per_file_coverage.sh). envoyproxy/coverage-shephards assignee is @RyanTheOptimist

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/32346 was synchronize by jmarantz.

see: more, trace.

@RyanTheOptimist the coverage change is actually to increase the coverage expectation. I assume that should be an easy one.

jmarantz avatar Feb 21 '24 23:02 jmarantz

Just started looking into this PR. A high level question: To my understanding, the Admin interface(stats/config_dump,...) is served by the Envoy main-thread. Do you mean if the stats is sent out by streaming , it is actually served by a separate Envoy thread, and such streaming can be terminated by a command from main_thread?

yanjunxiang-google avatar Feb 23 '24 06:02 yanjunxiang-google

@yanjunxiang-google you are correct: /stats is served by the main thread. But it is served in chunks via a Request API internal to Envoy now, with a nextChunk method. Actually all admin endpoints go through that API but (for now) all the others deliver the entire content in one chunk. Thus, at least for stats, this mechanism enables streaming through the C++ endpoints and I have a demo I can show you internally about how much less memory that consumes, compared to buffering the entire response in one string.

See the PRs and Issues referenced in the description above for more detail.

jmarantz avatar Feb 23 '24 13:02 jmarantz

Per discussion with @ravenblackx I will add an FSM diagram or similar to explain the life of a streaming response.

Feel free to read the PR in parallel or hold off reviewing until after I add that.

jmarantz avatar Feb 23 '24 21:02 jmarantz

FSM diagram: let me konw of this helps: https://docs.google.com/drawings/d/1njUl1twApEMoxmjaG4b7optTh5fcb_YNcfSnkHbdfq0/edit

jmarantz avatar Feb 26 '24 01:02 jmarantz

thanks for the reviews; will do a pass over the comments later today.

jmarantz avatar Feb 27 '24 21:02 jmarantz

I got the thread-synchronizer strategy to work but decided it was more verbose and harder to read than the interlock call, so I dropped it.

jmarantz avatar Mar 07 '24 02:03 jmarantz

Filed https://github.com/envoyproxy/envoy/issues/32769 for test flake in ext_proc

/retest

jmarantz avatar Mar 07 '24 15:03 jmarantz

/retest

jmarantz avatar Mar 07 '24 16:03 jmarantz

/retest

jmarantz avatar Mar 07 '24 17:03 jmarantz

@tyxia @ravenblackx are there more comments? Thanks!

jmarantz avatar Mar 10 '24 18:03 jmarantz

Thansk for the great comments, @ravenblackx and @tyxia .

Greg can you do a senior maintainer pass? Thanks!

jmarantz avatar Mar 12 '24 17:03 jmarantz