proxy icon indicating copy to clipboard operation
proxy copied to clipboard

Enable envoy.filters.http.file_system_buffer

Open bacek opened this issue 10 months ago • 9 comments

What this PR does / why we need it:

Just add http filter file_system_buffer

Special notes for your reviewer:

Also fixes formatting on gcp_authn

bacek avatar Feb 10 '25 06:02 bacek

😊 Welcome @bacek! This is either your first contribution to the Istio proxy repo, or it's been a while since you've been here.

You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

istio-policy-bot avatar Feb 10 '25 06:02 istio-policy-bot

Hi @bacek. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

istio-testing avatar Feb 10 '25 06:02 istio-testing

I'm not sure if this's a common requiement, can you raise this up in WG/TOC meeting?

cc @howardjohn @keithmattix

zirain avatar Feb 10 '25 06:02 zirain

I'm not sure if this's a common requiement, can you raise this up in WG/TOC meeting?

I do need to fully decouple appserver from the slow clients. FileSystemBuffer on the gateway is the default solution. I can't give infinite amount of memory to the in memory buffer. Are the any other options?

bacek avatar Feb 10 '25 20:02 bacek

@bacek at what point in the request/connection lifecycle are you seeing the slow behavior? During listener filter execution? HTTP parsing?

keithmattix avatar Feb 10 '25 21:02 keithmattix

@bacek at what point in the request/connection lifecycle are you seeing the slow behavior? During listener filter execution? HTTP parsing?

It's not Envoy itself which is slow. FileSystemBuffer is preventive measure from very slow network clients. Imaging scenario like:

  1. N "slow" clients are making API call. "Slow" as in network bandwidth. And N is bigger than DB connection pool (M).
  2. Each result can be quite big. For example 1 GB.
  3. Appserver will start sending response to all M request, keeping DB connection.
  4. Envoy buffers will fill and kicking off high watermark back up.
  5. Appserver's network socket will fill up and socket write will eventually block.
  6. After this appserver is fully DoSed, because all DB connections are occupied by slow clients.

As far as I understand, there are not many options available to prevent this.

One of them is to drop very slow clients, which will result in broken connection on the appsever and releasing DB connection. It's not ideal.

Another option is to deploy FileSystemBuffer with a somewhat large disk space and buffer appserver's response to the disk to server to the slow clients. Eventually. Disk space is much cheaper and I don't really care about latency in this case.

bacek avatar Feb 10 '25 21:02 bacek

I recall that istio want make istio-proxy act as a readonly container as possible(correct me if I'm wrong), in that case, I think you need bring this up to WG meeting or get an approval from TOC.

cc @istio/technical-oversight-committee

zirain avatar Feb 12 '25 10:02 zirain

I recall that istio want make istio-proxy act as a readonly container as possible(correct me if I'm wrong), in that case, I think you need bring this up to WG meeting or get an approval from TOC.

"Read only" as in "no state stored in the running pod" or "root fs inside pod is ro"?

In the former case it's not true, because there are in memory buffers which will require graceful shut-down to fully flush. And integration with cloud provider to flush/drain on load balancer (NLB with AlbController on AWS in my case).

For the latter I was using standard volume/volumeMounts feature to mount ephemeral volume into gateway for the FileSystemBuffer to use. So pod's root fs stays ro.

bacek avatar Feb 12 '25 10:02 bacek