aws-sdk-cpp icon indicating copy to clipboard operation
aws-sdk-cpp copied to clipboard

[aws-cpp-sdk-s3-crt]: need default settings for TCP connection-monitoring options

Open grrtrr opened this issue 3 years ago • 1 comments

Describe the bug

We have repeatedly experienced problems with S3CrtClient programs that were hanging due to stuck/low-progress TCP connections. These can be avoided by TCP keep-alive probes and TCP connection-speed monitoring, both are enabled by default.

The S3CrtClient and aws_s3_client do not enable either option by default:

have to be explicitly configured.

Expected Behavior

Programs should recover from bad TCP connections.

We documented problems caused by the absence of TCP keep-alive probes in https://github.com/awslabs/aws-c-s3/issues/210.

Keep-alive probes do not cover all cases - a connection may respond to keep-alive probes, but lack progress/speed. One possible cause is CPU starvation - we have observed stuck programs which do no more than reading a small file from s3, and write out a few files to s3 when the CPU load was near 100%.

There is likely reason why the main SDK has been enabling both TCP keep-alives and TCP connection-speed monitoring by default.

We should be able to expect the same defaults (and recovery from bad TCP connections) in the S3CrtClient.

Current Behavior

There is no recovery from bad TCP connections currently.

Reproduction Steps

Please see https://github.com/awslabs/aws-c-s3/issues/210 for details on TCP keep-alive issues.

These problems will occur with bad TCP connections. Tools like tc and/or iptables can be used to emulate these.

Possible Solution

  • https://github.com/awslabs/aws-c-s3/pull/204 exposed TCP keep-alive and connection-monitoring options for the s3_client;
  • https://github.com/aws/aws-sdk-cpp/issues/1882 extended the options for the S3CrtClient,
  • https://github.com/aws/aws-sdk-cpp/pull/2101 enables TCP keep-alive and connection-monitoring by default (i.e. fixes the problem reported above).

Additional Information/Context

No response

AWS CPP SDK version used

1.9.x (1.9.170, but problem also on master).

Compiler and Version used

gcc / clang

Operating System and version

Linux, ubuntu 18.04

grrtrr avatar Sep 22 '22 18:09 grrtrr

Thanks for making the PR, I'm reviewing the changes that you've made

jmklix avatar Sep 22 '22 21:09 jmklix

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

github-actions[bot] avatar Oct 28 '22 23:10 github-actions[bot]