S3 Clients are leaving behind open PIPE file descriptors
Describe the bug
We are using the aws v2 sdk on of our processes which ran out of file descriptors which led us to investigate the open ones. We can see new FDs being added and not closed even if we do simple headBucket calls to s3. Example of open FDs added:
java 6320 root 182u a_inode 0,14 0 1057 [eventpoll:183]
java 6320 root 183r FIFO 0,13 0t0 275505 pipe
java 6320 root 184w FIFO 0,13 0t0 275505 pipe
java 6320 root 185u a_inode 0,14 0 1057 [eventpoll:186]
java 6320 root 186r FIFO 0,13 0t0 275506 pipe
java 6320 root 187w FIFO 0,13 0t0 275506 pipe
java 6320 root 188u a_inode 0,14 0 1057 [eventpoll:189]
java 6320 root 189r FIFO 0,13 0t0 275507 pipe
java 6320 root 190w FIFO 0,13 0t0 275507 pipe
Here's how we are building our client:
S3AsyncClient s3AsyncClient = S3AsyncClient.crtBuilder()
.credentialsProvider(credentialsProvider)
.region(Region.US_EAST_1)
.build();
S3TransferManager s3TransferManager = S3TransferManager.builder()
.s3Client(s3AsyncClient)
.build();
We also have S3Client, StsClient and S3Presigner being setup similarly and we do an explicit .close() on each of them. Irrespective of the close() being called any file handles opened due to the clients should be garbage collected on destruction of the object which is not happening.
The same is not seen with the v1 SDK.
Expected Behavior
Any file descriptors opened due to operations by the sdk should be garbage collected when the clients are destroyed.
Current Behavior
Pipe and a_inode file descriptors being left behind even after closing on the clients.
Reproduction Steps
Create any process (imagine an api server which makes request to s3) which instantiates a s3 client and makes some basic requests like headBucket, keep the process up even after the request is completed. Compare the open file descriptors before and after the request using sudo lsof -p <pid> or sudo ls -l /proc/<pid>/fd.
Possible Solution
No response
Additional Information/Context
No response
AWS Java SDK version used
2.25.26
JDK version used
openjdk version "1.8.0_402"
Operating System and version
Rocky Linux release 9.4 (Blue Onyx)
We were able to narrow this down to the S3AsyncClient with CRT and S3TransferManager, here's an explanation of the scenario:
Step1. Check count of open file descriptors of java process using: sudo ls -l /proc/<pid>/fd | wc -l
Our output:
sudo ls -l /proc/733969/fd | wc -l
123
Step2. Execute this in the process:
S3AsyncClient s3AsyncClient = S3AsyncClient.crtBuilder()
.credentialsProvider(credentialsProvider)
.region(Region.US_EAST_1)
.build();
S3TransferManager s3TransferManager = S3TransferManager.builder()
.s3Client(s3AsyncClient)
.build();
Step 3. Run same command as step1. Our output:
sudo ls -l /proc/733969/fd | wc -l
205
Example of FDs opened by the processes: (run sudo ls -l /proc/<pid>/fd)
lrwx------ 1 dyutishb dyutishb 64 Jun 19 15:41 183 -> 'anon_inode:[eventpoll]'
lr-x------ 1 dyutishb dyutishb 64 Jun 19 15:41 184 -> 'pipe:[16427762]'
l-wx------ 1 dyutishb dyutishb 64 Jun 19 15:41 185 -> 'pipe:[16427762]'
lrwx------ 1 dyutishb dyutishb 64 Jun 19 15:41 186 -> 'anon_inode:[eventpoll]'
lr-x------ 1 dyutishb dyutishb 64 Jun 19 15:41 187 -> 'pipe:[16427763]'
l-wx------ 1 dyutishb dyutishb 64 Jun 19 15:41 188 -> 'pipe:[16427763]'
Step4. Close the clients using:
s3AsyncClient.close();
s3TransferManager.close();
Step 5. Repeat commands from step1 and 3: Our output:
sudo ls -l /proc/733969/fd | wc -l
169
We still see similar open FDs:
lrwx------ 1 dyutishb dyutishb 64 Jun 19 15:41 161 -> 'anon_inode:[eventpoll]'
lr-x------ 1 dyutishb dyutishb 64 Jun 19 15:41 162 -> 'pipe:[16427755]'
l-wx------ 1 dyutishb dyutishb 64 Jun 19 15:41 163 -> 'pipe:[16427755]'
lrwx------ 1 dyutishb dyutishb 64 Jun 19 15:41 164 -> 'anon_inode:[eventpoll]'
lr-x------ 1 dyutishb dyutishb 64 Jun 19 15:41 165 -> 'pipe:[16427756]'
l-wx------ 1 dyutishb dyutishb 64 Jun 19 15:41 166 -> 'pipe:[16427756]'
The close() calls clears out some of the open pipe handles but not all are cleared and these are not released even on destruction of the object which is leading to our process having too many open handles.
Similar to: https://github.com/aws/aws-sdk-java-v2/issues/5271
is there any for fix for this issue?
Greetings,
Are there any updates on this? Our production workloads in AWS are affected and some are in a permanent degradation status.
Thanks for your attention.
Hello,
Is there any progress on this? We are also seeing this issue on our production environments as well, our systems are impaired because of this condition; our software is also in charge to copy data between on-prem NFS mounts so the app is going into a permanent degradation state whenever we hit the FDS limit product of this leak.