gatsby-plugin-s3
gatsby-plugin-s3 copied to clipboard
Request: Incremental uploads
Hi there!
Currently uploading my 1500 blog posts to s3 is taking ~ 50% of my total build time. Most of the time only 1 or 2 files are changed in the public directory.
Are there any plans to implement partial/incremental uploading? Perhaps behind a feature flag?
This should already be enabled by default. We check the ETag (hash) and existence of the file in the bucket. https://github.com/jariz/gatsby-plugin-s3/blob/master/src/bin.ts#L235
Please provide a reproduction repo or update to the latest version if you haven't already.
Thanks Jariz! My problem is I'm now over 10k objects in my s3 bucket, and it's taking about 30 seconds to do all that checking against every item in the bucket.
Is there a way to use filesystem indicators to not have to check everything against s3?
I think this is a great idea. It will be especially important once we implement #59, which I think will double the amount of requests needed to check of it objects are up to date.
My suggestion is that we store the file representing current S3 versions in S3. The reason being that otherwise you can easily run into issues with the file going out of sync if you deploy from multiple computers. Then it's one S3 request to download the state file, instead of one/two requests for each object.
I'd be happy to review a PR for this.
I am seeing large build times for a big site of mine as well, uploading every time > 100k files. Could this be a real bottleneck? right now deploy time is around 15-20 minutes on a medium+ circleci
uploading every time > 100k files.
How are you measuring this number? If 100k files are actually being uploaded each time, that suggests a separate issue (where the ETag isn't matching for some reason), but if it's just scanning 100k files then that means it's making 100 requests to listObjectsV2
. I wouldn't expect that to take too long.
I wonder if part of the issue is that this line is unoptimised.
Also, I've realised that without the improvement suggested earlier #59 will actually result in 100000% more requests being made, not just double. So I really think this is something we'll want to implement.
Not sure how many files are really uploaded each time but the build has > 70k pages. if we account for the page-data and some other files then it should be more than 100k. on the other hand i do think the files change always since the css/js files change so i would expect the html file to change as well. i tried with a parallel limit of 100 as well and there was no change.
If you're expecting every page to change, then every page will need to be re-uploaded. Not much we can do about that. Do you really need to be changing your global CSS/JS that frequently?
its not frequently, its during full rebuilds like when we actually change the code. maybe in the end what takes time is really the upload of so many pages. i wonder if there's some possibility to do some logging? not sure exactly how the plugin internally works but could be nice to know how many files were uploaded skipped deleted during a run. possibly turning on some metric as well (how many seconds spend uploading etc)