remotion icon indicating copy to clipboard operation
remotion copied to clipboard

Don't re-upload assets to S3 if they're already present.

Open sammarks opened this issue 2 years ago • 2 comments

Feature Request 🛍️

Whenever using deploySite(), it would be nice if existing files were not re-uploaded.

Use Case

Some projects I work on have very large video assets (2GB), so it's a pain to have to wait for those to be re-uploaded every time.

Possible Solution

Perhaps compare existing files inside the public folder of the deployed site (if there is one) based on file size, and don't re-upload the file if one already exists with the correct size.

Of course, it would also be nice to include some way of overriding this, and forcefully re-uploading all of the files (though I suppose you could just call deleteSite() first instead).

sammarks avatar May 02 '22 20:05 sammarks

Thanks for the issue, and I understand the desire! To do it correctly, we'd have to get a hash of the file that is in the S3 bucket.

I've done some quick research and it seems, this is tricky especially for large files. While normally files have an ETag that is the MD5 hash, for large files uploaded using the Multipart API, this is not the case.

This is an interesting project: https://github.com/antespi/s3md5 but it has no Windows support and is written in bash.

While I could foresee us implementing it, it is too time-intensive to prioritize it. We are putting it in the backlog and looking for contributors.

If you like, you can also copy-paste and customize the deploySite() function: https://github.com/remotion-dev/remotion/blob/main/packages/lambda/src/api/deploy-site.ts

using the getAwsClient() you can write a custom script that uploads the site to S3: https://www.remotion.dev/docs/lambda/getawsclient

JonnyBurger avatar May 04 '22 08:05 JonnyBurger

Sounds good! For additional information, here looks like the criteria s3 sync uses to determine if something needs to be re-uploaded:

A local file will require uploading if one of the following conditions is true:

  • The local file does not exist under the specified bucket and prefix.
  • The size of the local file is different than the size of the s3 object.
  • The last modified time of the local file is newer than the last modified time of the s3 object.

I think it would be fine if this was the logic (thus avoiding having to use something like s3md5), as long as there is some sort of flag to override this behavior and force a sync each time. Could even make it opt-in instead as well.

sammarks avatar May 04 '22 13:05 sammarks

Turns out even multipart uploads have a ETag. Implemented!

JonnyBurger avatar Dec 05 '22 17:12 JonnyBurger

Awesome, thank you!

sammarks avatar Dec 05 '22 17:12 sammarks