envoy
envoy copied to clipboard
admin: add streaming variant of the admin API
Commit Message: In https://github.com/envoyproxy/envoy/pull/19693 a streaming stats implementation was introduced, dramatically reducing the amount of time and memory it takes to generate an admin http response. /config_dump (#32054) and /clusters_ (#32054) can also be improved in this way.
However, in some environments we do not want to expose an HTTP port for admin requests, and we must use the C++ API. However that API is not streaming: it buffers the entire content. This PR adds a new streaming API.
Mechanically this new functionality was easy to add as an externally callble C++ API as the Admin class already supported this model on behalf of /stats.
However there's a fair amount of complexity managing potential races between active streaming requests, server exit, and explicit cancellation of an in-progress request. So much of the effort and complexity in this PR is due to making that thread-safe and testing it.
Additional Description:
Note to reviewers: if this PR is too big it can be broken into 2. The significant additions to main_common.cc, main_common.h, and main_common_test.cc need to stay together in one PR, the rest of the changes are non-functional refactors which are needed for the larger changes to work; they could be reviewed and merged first.
Risk Level: medium -- this is a new API but it does refactor some existing functionality. Using the new functionality will add some risk also as there is some careful thought required to believe we have protected against all possibilities of shutdown/cancel/admin-request races. Testing: //test/..., and lots of tsan repeated tests for the new streaming tests. Lots of iteration to ensure every new line of main_common.cc is hit by coverage tests. Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Fixes: #31755
As a reminder, PRs marked as draft will not be automatically assigned reviewers, or be handled by maintainer-oncall triage.
Please mark your PR as ready when you want it to be reviewed!
CC @envoyproxy/coverage-shephards: FYI only for changes made to (test/per_file_coverage.sh).
envoyproxy/coverage-shephards assignee is @RyanTheOptimist
@RyanTheOptimist the coverage change is actually to increase the coverage expectation. I assume that should be an easy one.
Just started looking into this PR. A high level question: To my understanding, the Admin interface(stats/config_dump,...) is served by the Envoy main-thread. Do you mean if the stats is sent out by streaming , it is actually served by a separate Envoy thread, and such streaming can be terminated by a command from main_thread?
@yanjunxiang-google you are correct: /stats is served by the main thread. But it is served in chunks via a Request API internal to Envoy now, with a nextChunk method. Actually all admin endpoints go through that API but (for now) all the others deliver the entire content in one chunk. Thus, at least for stats, this mechanism enables streaming through the C++ endpoints and I have a demo I can show you internally about how much less memory that consumes, compared to buffering the entire response in one string.
See the PRs and Issues referenced in the description above for more detail.
Per discussion with @ravenblackx I will add an FSM diagram or similar to explain the life of a streaming response.
Feel free to read the PR in parallel or hold off reviewing until after I add that.
FSM diagram: let me konw of this helps: https://docs.google.com/drawings/d/1njUl1twApEMoxmjaG4b7optTh5fcb_YNcfSnkHbdfq0/edit
thanks for the reviews; will do a pass over the comments later today.
I got the thread-synchronizer strategy to work but decided it was more verbose and harder to read than the interlock call, so I dropped it.
Filed https://github.com/envoyproxy/envoy/issues/32769 for test flake in ext_proc
/retest
/retest
/retest
@tyxia @ravenblackx are there more comments? Thanks!
Thansk for the great comments, @ravenblackx and @tyxia .
Greg can you do a senior maintainer pass? Thanks!