s5cmd
s5cmd copied to clipboard
Question: where does s5cmd main performance benefits come from?
Hopefully pretty self-explanatory issue title. I've snooped around the code a bit, so I'll give my guess, but just wondered if others could chime in on what ends up giving s5cmd such impressive performance in the README charts.
- Re-use of aws sessions as noted here
- Ability to run various GET/DELETE requests in parallel as outlined in docs
- Specification of concurrency/part_size for large file download/upload
I guess I'm curious how much impact each of these have on overall performance gains. I'll admit I was somewhat surprised that "under the hood" the core aws-sdk routines were used for the actual requests, so it made me wonder how the performance in s5cmd could be so much better on top and hence the issue/theories above.
Thanks!
(for context, I'm looking to implement a performant cloud storage API in Julia language, so I'm looking at the "best" implementations people have been referring me to 😄 )
I am curious about this too as when I came across this project and tried it, it made me wonder what makes it fast and whats the downside, reliability may be as s5cmd doesnt do checksums for verification I guess.