s5cmd
s5cmd copied to clipboard
Inconsistent downloads with a command.txt on AWS Lambda warm invocations
Hi there, I am using s5cmd to increase the S3 download/upload speed on AWS Lambda. I am having a problem with s5cmd on AWS Lambda warm invocations, as parts of a command.txt are sometimes skipped.
Environment:
- s5cmd version: 2.2.2.
- Python subprocess to call s5cmd inside a Lambda container function
subprocess.run(["/usr/local/bin/s5cmd", "--numworkers", "16", "run", "/tmp/project/personA/download_commands.txt" ],shell=False,check=True)
- Maximum hardware specs for the AWS Lambda function
1. First invocation/cold start. A new Lambda instances spawns and everything is downloaded correctly. I can see log events for the copy
download_commands.txt for personA:
[INFO] 2023-11-16T10:41:33.990Z <some-uui-id> cp s3://bucket/personA/video/0_folder/* /tmp/project/personA/video/0_folder/
cp s3://bucket/personA/video/3_folderA/* /tmp/project/personA/video/3_folderA/
cp s3://bucket/personA/video/3_folderB/* /tmp/project/personA/video/3_folderB/
cp s3://bucket/personA/video/7_folder/* /tmp/project/personA/video/7_folder/
cp s3://bucket/personA/video/11_folder/* /tmp/project/personA/video/11_folder/
I can see log events for the 0_folder
cp s3://bucket/personA/video/0_folder/prefix_A/part_10.obj /tmp/project/personA/video/0_folder/prefix_A/part_10.obj
2. Invocation/warm start. Lambda uses the same Lambda instance as 1.
download_commands.txt for personB:
[INFO] 2023-11-16T10:42:37.682Z <some-uui-id> cp s3://bucket/personB/video/0_folder/* /tmp/project/personB/video/0_folder/
cp s3://bucket/personB/video/3_folderA/* /tmp/project/personB/video/3_folderA/
cp s3://bucket/personB/video/3_folderB/* /tmp/project/personB/video/3_folderB/
cp s3://bucket/personB/video/7_folder/* /tmp/project/personB/video/7_folder/
cp s3://bucket/personB/video/11_folder/* /tmp/project/personB/video/11_folder/
No error message, but the 0_folder
is not downloaded. No log events for copying any content of 0_folder
. 3_folderA
, 3_folderB
, 7_folder
and 11_folder
are being downloaded and logged
- The problem only occurs during Lambda warm invocations. It never happens for a Lambda cold start
- It can happen for any person's prefix. Sometimes personA, sometimes personB
- It not just happens to 0_folder, it can happen to other folders
- It shouldn't be a problem will still existing data in the Lambda container. There is a clean up of
/tmp/project/personA
after every run and download paths are different for each person. - No problems with aws cli
Any help would be highly appreciated. It's a great tool and speeds up the S3 work significantly :)