components icon indicating copy to clipboard operation
components copied to clipboard

Proposal: improve large source code uploading by hash compare and layer utilization

Open hkbarton opened this issue 4 years ago • 3 comments

Problem

Today, if the user tries to deploy the project which including large source code, we'll upload the entire source code every time user runs the deploy command. Most parts of the source code are usually 3rd party dependencies or assets which don't change too often, e.g. normal NextJS project includes 100mb node modules and some static assets like images. All those stuff won't change too much, and usually, they only need to be deployed on the very first deployment. It's slow and unnecessary to upload them every time we run deploy.

Solution

We can leverage layer to hold common/shared resources like node modules or static assets and only update layer when the content of those common/shared resources are changed. We can use hash computing and comparing to figure out whether those resources are changed or not during the deployment process and be able to avoid re-upload the same thing if they're not changed.

Design

The entire process of hash computing and uploading skip can be performed as follow:

  1. config serverless.yml to tell the CLI tool which folders/content are subject to not change often and can be skipped if the content of those doesn't change by adding accelerate section into inputs.src. e.g.
component: express
name: express-demo
stage: dev

inputs:
  src:
    src: ./src
    accelerate:
      - 'node_modules/**'
    exclude:
      - '**/*.log'
  1. before uploading the source code, CLI access a newly added serverless platform API getHashOfUploadedContent to get the hash of the content in the accelerable section that being uploaded last time. CLI also computes the current hash of the content in the accelerable section. Then it compares those two hashes to figure out whether it needs to upload those content or not.
  2. if the hashes are not equal(this could because of very first deployment since there is no hash stored in backend, or the content has been actually changed), CLI will pack and upload those content as usual as well as report hashes to the serverless platform API. The serverless platform will record the hashes on the instance record and store them into the database in order to be queried. Then the platform will put/update those content into COS/S3 and add the COS/S3 URL into Component inputs, so the component can do whatever it wants to use those content (e.g. creating a layer from it)
  3. if the hashes are equal, CLI will skip packing and uploading those content and treat them as exclude in the source. The platform will grab the previous COS/S3 URL from the instance record for downloading those content and pass it to Component inputs.

hkbarton avatar Jun 05 '20 08:06 hkbarton

hi @ac360 @medikoo can you please help to review this proposal? Thanks a lot!

tinafangkunding avatar Jun 08 '20 09:06 tinafangkunding

@hkbarton it's definitely a painful issue that needs to be solved.

I wonder though, whether that can be tackled without introducing an extra configuration option (it'll be great to avoid jumping on a road, where we enrich our config with tens of properties that require fine tuning,as e.g. it's the case with Webpack now), ofc if there's no better way, then there's no argument.

Ideally our setup should be smart enough to detect what have changed and submit just diff, without asking a user to setup some config to have that working.

What if we could confirm with backend just by sending hashes which files differ against uploaded state, and having an answer upload only those changed? Then on component side, we can update archive with new files. Would that be possible?

medikoo avatar Jun 08 '20 10:06 medikoo

@hkbarton it's definitely a painful issue that needs to be solved.

I wonder though, whether that can be tackled without introducing an extra configuration option (it'll be great to avoid jumping on a road, where we enrich our config with tens of properties that require fine tuning,as e.g. it's the case with Webpack now), ofc if there's no better way, then there's no argument.

Ideally our setup should be smart enough to detect what have changed and submit just diff, without asking a user to setup some config to have that working.

What if we could confirm with backend just by sending hashes which files differ against uploaded state, and having an answer upload only those changed? Then on component side, we can update archive with new files. Would that be possible?

@medikoo In my mind, this is definitely possible. While it's a great/good engineer challenge I think it's doable and I agree with you that less configuration leads to better developer experience. We can have a list of hashes of every file that will be upload (we already need this if we want to compute the one hash of entire folder) and pass that list to the backend, then the backend can compare that with the stored version of the hash list and return the diff. Then the CLI will be able to just pack the diff files and upload.

hkbarton avatar Jun 08 '20 13:06 hkbarton