artifact-manager-s3-plugin icon indicating copy to clipboard operation
artifact-manager-s3-plugin copied to clipboard

Modified S3 uploads to be done in parallel

Open beryl-factset opened this issue 5 years ago • 6 comments

I've modified a few of the S3 functions to use parallelstream() vs the for loop this should improve upload times drastically. I saw a drop from 26 mins to 9 mins for 10,000 5KB files.
Jenkins issue JENKINS-61936 https://issues.jenkins-ci.org/browse/JENKINS-61936

beryl-factset avatar May 06 '20 20:05 beryl-factset

One point to make, all our testing has been with regard to upload performance. We had not investigated what it would take to improve download performance of artifacts hosted on S3.

@jglick if I'm reading this plugin correctly, it does not handle Artifact downloads. I would assume, therefore, that artifact download (Specifically downloading a Zip of all artifacts) is handled by Jenkins core. That may be something our team will need to look into in the future. Our use case involves uploading ~5,000 files from one host, and then pulling a zip (Using the Zip archive link generated by Jenkins) of those files down onto another host for deployment. We've been experiencing performance issues on both sides of the process.

hutson avatar May 07 '20 19:05 hutson

Our use case involves uploading ~5,000 files from one host, and then pulling a zip (Using the Zip archive link generated by Jenkins) of those files down onto another host for deployment.

Do not do that. *zip* URLs are not optimized and will force the Jenkins master to retrieve every file from S3, ZIP them up, then send that in a giant HTTP response. Not what you want. Much better to zip up all files inside the workspace where they are generated and archive that one file. The resulting artifact/… URL from the Jenkins master will just be a redirect to S3.

See https://jenkins.io/jep/202 for background.

jglick avatar May 07 '20 19:05 jglick

@jglick so what are my next actions ?

beryl-factset avatar May 07 '20 21:05 beryl-factset

My next actions are to review, and perhaps rewrite a bit, once I have time to work on this plugin—it is on the back burner. But I do not recommend using this plugin with a large number of artifact files to begin with. If you use a handful of large artifacts it should behave better without modification. (The overhead of lots of little artifacts with external URLs is more with Jenkins core and other JEP-202 plugins than with this plugin per se.)

I should say that download performance of a single file would be fine, so uploading thousands (assuming this patch) but only actually using one or two would not be a problem. But if your use case is similar to @hutson’s then you should certainly switch to archiving a tarball or zip.

jglick avatar May 07 '20 22:05 jglick

Okay Thanks, Please let me know if you need anything from me!

beryl-factset avatar May 07 '20 22:05 beryl-factset

See https://jenkins.io/jep/202 for background.

Thank you for the reference.

Not what you want.

I definitely concur. We are working on adopting the practice of pre-zipping files before archiving, but given the breadth of our user base, we know it's going to take time.

My next actions are to review, and perhaps rewrite a bit, once I have time to work on this plugin

Thank you for your time. 👍

hutson avatar May 15 '20 22:05 hutson