s5cmd
s5cmd copied to clipboard
cp of large files failing silently when dealing with large amounts of data
I experimented with the following run cmd:
( echo cp -n -s -p 50 -c 70 "D:/200k_files/" "s3://bucket/31-03-2022/200k_files/" & echo cp -n -s -p 50 -c 70 "D:/big_files/" "s3://bucket/31-03-2022/big_files/") | s5cmd --retry-count 100 run >> D:/inventory.txt ) | s5cmd --numworkers 128 --retry-count 100 run >> D:/inventory.txt
In my 200k_files folder, I have 200k copies of a small file(in MB), and in the big_files folder I have 10 copies of a 200GB file. This all adds up to about 9TB of data.
I'm on a 10Gbps line, with 16GB ram and 6xcore i5-9500TE. Running the above sends my CPU to 100 most of the time, and takes up most of my RAM. After s5cmd completes, I try to run the command again to see if there's anything missing. And typically, I get a few of the small files and half of the large files missing. I'm utilizing 100% of my CPU and getting 90%+ of my RAM, which is concerning and probably the cause.
I checked my inventory.txt log and found no errors indicating failure. Do I need to scale down concurrency and numworkers?
I am on v1.4.0
If you push your hardware too far it seems like copies of large data fails silently. I fixed this by lowering numworkers, at the cost of throughput. Still seems like a bug to me though.
I'm thinking that this is related to the -n- s flags as these behaviors don't occur w/o those flags.
Hello, could you please try the same with the latest version and share the exit code of the command?
@tianshiz Could you try with v2.1.0
and report back if this issue still persists? I'm closing this issue but please re-open if needed. Thank you.