remotion
remotion copied to clipboard
Don't re-upload assets to S3 if they're already present.
Feature Request 🛍️
Whenever using deploySite()
, it would be nice if existing files were not re-uploaded.
Use Case
Some projects I work on have very large video assets (2GB), so it's a pain to have to wait for those to be re-uploaded every time.
Possible Solution
Perhaps compare existing files inside the public
folder of the deployed site (if there is one) based on file size, and don't re-upload the file if one already exists with the correct size.
Of course, it would also be nice to include some way of overriding this, and forcefully re-uploading all of the files (though I suppose you could just call deleteSite()
first instead).
Thanks for the issue, and I understand the desire! To do it correctly, we'd have to get a hash of the file that is in the S3 bucket.
I've done some quick research and it seems, this is tricky especially for large files. While normally files have an ETag that is the MD5 hash, for large files uploaded using the Multipart API, this is not the case.
This is an interesting project: https://github.com/antespi/s3md5 but it has no Windows support and is written in bash.
While I could foresee us implementing it, it is too time-intensive to prioritize it. We are putting it in the backlog and looking for contributors.
If you like, you can also copy-paste and customize the deploySite() function: https://github.com/remotion-dev/remotion/blob/main/packages/lambda/src/api/deploy-site.ts
using the getAwsClient() you can write a custom script that uploads the site to S3: https://www.remotion.dev/docs/lambda/getawsclient
Sounds good! For additional information, here looks like the criteria s3 sync
uses to determine if something needs to be re-uploaded:
A local file will require uploading if one of the following conditions is true:
- The local file does not exist under the specified bucket and prefix.
- The size of the local file is different than the size of the s3 object.
- The last modified time of the local file is newer than the last modified time of the s3 object.
I think it would be fine if this was the logic (thus avoiding having to use something like s3md5), as long as there is some sort of flag to override this behavior and force a sync each time. Could even make it opt-in instead as well.
Turns out even multipart uploads have a ETag. Implemented!
Awesome, thank you!