serverless-meta-sync icon indicating copy to clipboard operation
serverless-meta-sync copied to clipboard

Proposals for an alternative meta sync methodology and workflow

Open ghost opened this issue 8 years ago • 0 comments

@ac360, in other threads, you've been asking for thoughts about how better to make meta sync's synchronization process work. So, here are some thoughts. Hopefully, they are useful, but I'm still relatively new to serverless (2 weeks), and the meta sync plugin, so could be off-base. Anyway, enough disclaimers, here goes...

At a high level, these proposals modify meta sync's current approach so that you don't have to run commands to sync meta values. The sharing of meta values between team members happens automatically as part of the resource deploy command. This will require that you modify your algorithmic approach from a synchronization algorithm, to a just-in-time, ordered loading algorithm. IOW, instead of synchronizing key/value pairs between multiple repositories, these proposals suggest a kind of "loading ladder", that looks for the nearest value, if not found there, backs off to a "further-away" location, and repeats until found or generated by Cloud Formation. Once a value is found, that value is (optionally) copied to a higher rung of the "loading ladder", so that future deployments reuse the value instantly. (NOTE: Serverless already implements this partially, via Cloud Formation Outputs, that generate key/value pairs during an sls resources deploy and stores those generated values in "./_meta". What I'm suggesting generalizes and extends that notion:

PROPOSAL #1: Configurable load ordering of multiple meta directories

This proposed meta sync workflow is based (in part) on the observation that it seems non-ideal to pick a solution that requires re-inventing any part of the wheel that is Git. So, why not allow multiple meta directories, some that are located outside of your serverless project and are git committed. Then, add an entry to your s-project.json file to point your serverless app to an array of meta paths, similar to a Java classpath, but for loading key/value pairs instead of jars. For example:

project_b:
   "custom": {
    "meta": {
      "load_order": [
              "./_meta",
              "../mycompany-aws-assets/project_b",
              "../mycompany-aws-assets",
        ]
    }

This meta "load order" loads key/value pairs from all three paths, with the earliest path overriding values from later paths. So first, Serverless would load key/value pairs from the "./_meta" directory in the current serverless app. Values specified here take precedence. Then, the load algorithm would load key/value pairs not set previously from "../mycompany-aws-assets/project_b". If the key/value pair is not defined here, the load algorithm continues to the next highest priority location "../mycompany-aws-assets". Each earlier directory in the load order, overrides and extends the key/value space of all later directories in the load order.

The "mycompany-aws-assets" directory could be structured as follows:

  mycompany-aws-assets
    |-- s-variables-common.json
    |-- s-variables-dev.json
    |-- s-variables-dev-useast1.json
    |-- s-variables-prod.json
    |-- s-variables-prod-useast1.json
    \-- project_a
    |     |-- s-variables-common.json
    |     |-- s-variables-dev.json
    |     |-- s-variables-dev-useast1.json
    |     |-- s-variables-common.json
    |     |-- s-variables-prod.json
    |     |-- s-variables-prod-useast1.json
    \-- project_b
          |-- s-variables-common.json
          |-- s-variables-dev.json
          |-- s-variables-dev-useast1.json
          |-- s-variables-common.json
          |-- s-variables-prod.json
          |-- s-variables-prod-useast1.json

So, key/values shared by all apps in the company are specified at the root level, but values that are application specific can be overridden in sub-directories. For example, a VPC that is shared by all apps within the company, would be defined in root's "s-variables-common.json". However, if project_b uses its own VPC, it could override the company default VPC, in its own "s-variables-common.json" file. And because all these aws-assets are externalized outside the serversless application, you have the option of git-committing these assets without imposing a security risk if you choose to open source the serverless app that depends on them.

Note: I included the second load path (i.e. "mycompany-aws-assets/project_b") for clarity. But in practice, I think that path should be implied and therefore omitted. All meta paths should have the option of hierarchy, where the root specifies values common to all apps, but each app can override root's key/value pair assignments in application-named sub-directories.

Lastly, the proposed meta sync workflow would be backward compatible, because by default, your load order could include "./_meta" only, in which case, it works as before. Implementation wise, this seems smaller in scope than proposal #2 (below).

PROPOSAL #2: Extension to allow S3 and DynamoDB in the load_path

This proposal recognizes that some people may prefer to store their most sensitive aws-assets on S3, or DynamoDB. For some, this will be because storing such information on S3 is more inline with 12 Factor recommendations, or perhaps because they'd like to modify certain application settings without requiring a redeployment. So, why not evolve the "load_order" extension of proposal #1, to allow S3 and DynamoDB paths to be added on to the ordered load path specification of proposal #1. For example:

  "custom": {
    "meta": {
      "load_order": [
              "./_meta",
              "../mycompany-aws-assets",
              "dynamo://mySharedVarsTable",
              "s3://mycompany"
       ],
        "cf_outputs": "../mycompany-aws-assets/project_b"  // Store CF generated values here
    }

Keys and values are loaded from this ordered array of paths during the sls resources deploy command. So, you wouldn't have to copy or sync values from S3 explicitly, those key/value pairs would be automatically and Just-In-Time loaded during a deployment, from the first position on the load paths array that yields a value. And this load paths array can includes paths to S3 or DynamoDB tables, inplemented via serverless plugins, that would enable those storage endpoints to be visited in the search for key/value pairs.

In some cases, using all the latest, JIT-loaded values from across the load path may be undesirable. For example, if you are working on a branch, you may want all key/value pairs to remain stable during that interim. For such cases, you could load the values you need to stabilize into the "./meta" directory. Since "./_meta" is first on the load path, it takes precedence over all later paths and can therefore be used to "freeze" values during your branch work. Serverless could facilitate this, by supporting a ``sls variables cache` command, that captures and writes all key/value pairs from the load path to your's apps "./meta" directory. Once your branch work is done and you are ready to absorb the changes on S3, you can simply remove all or parts of your "./_meta" directory. Then, you're back to a live, JIT-loaded variables environment.

Sharing variables with team members via the external, git committed directory (i.e. "../mycompany-aws-assets"), is solved by git. But sharing variables through S3 or DynamoDB would require command line support that is somewhat different from what sls meta sync does today. I would suggest an sls variable push <var_name> command that finds the variable in the load_path, and moves that variable up or down in the load path. So, pushing a key/value pair stored at "../mycompany-aws-assets/project_b" down (e.g. to a lower, more remote position in the load path) would automatically deploy it to S3, at "s3://mycompany/project_b". Like the sls meta sync command, the push operation would note conflicts and allow you to resolve in a manner similar to what meta sync does currently.

Regarding Cloud Formation's Outputs, serverless currently writes those values to your app's "./meta" directory - a practice that may not work so well with this proposal. If you freeze values in your ./meta directory from S3 (as described above), then you'll probably want to keep that separate from locally generated ARNs etc. This would be afforded by supporting "cf_outputs", in s-projects.json, that instructs serverless where (in the load path) to write CF Outputs. This way, generated ARNs can still be git committed if you choose, without potentially exposing company secrets by inclusion in a potentially open source serverless app.

So, there it is. Sorry for the verbosity, but I hope this is clear and useful. My node skills are probably too young for me to do this work and my time is very limited. I may still take a crack at Proposal #1 if @ac360 and others believe this approach makes sense.

ghost avatar May 05 '16 21:05 ghost