architect
architect copied to clipboard
Projects that use newer AWS-SDK version plus a non-default AWS profile = bad times
Describe the issue If:
- you use npm versions 1, 2 or 7, and
- your arc app uses a newer SDK version than the one pinned in arc currently (2.712.0), and
- you use a profile named anything other than
default, then
you will have a bad time when deploying, as the default profile will be used.
Steps to reproduce Steps to reproduce the behavior:
- Make sure you set a non-default profile name in your
app.arcfile - Make sure your arc app has a newer than 2.712.0 version of the aws-sdk set up in your arc app's dependencies
- Try to run
arc deploy - The specific failure mode will depend on the context of your AWS accounts, but ultimately,
deploywill try to deploy into the AWS account associated to yourdefaultprofile, whereas you would expect it to deploy to the profile specified in your app.arc file.
Expected behavior
deploy would deploy to the profile specified in your app.arc file, and not to the default one.
Additional context
Took me a while to trace this down, but the issue stems from the fact that if your app specifies a different aws-sdk version than the one architect pins to, two aws-sdk instances exist in the node_modules dependency tree: the aws-sdk shallowest in the tree will be the one specified by your arc app's dependencies, whereas the @architect/deploy dependency will have its own aws-sdk instance version (which, as of writing this, is pinned to 2.712.0).
When architect loads up, it will do the following in this exact order:
- Pull in
deploy, which will pull in the [email protected] - which is a peer dependency in deploy, and with npm v7, peer dependencies are automatically installed.
- At the moment the aws-sdk is pulled into deploy, the environment variable
AWS_PROFILEis not set, so this instance of the AWS SDK defaults to using thedefaultprofile
- Arc will then run its "before" routine, which also pulls in the aws-sdk. The issue is that this aws-sdk instance is different from the one pinned inside deploy. Next, the before routine defers to util's
bannermethod, which does the work of figuring out which profile to use and setting theAWS_PROFILEenvironment variable and also setting the global configuration for the AWS SDK instance. Unfortunately, this AWS SDK instance is different from the one used indeployin step 1. Thus how we get to this issue.
Workaround
I can work around this issue by explicitly specifying the AWS_PROFILE env var when running the deploy command, i.e. AWS_PROFILE=copper arc deploy
@reconbot brings up a suggestion on how to address this: treat the AWS profile name as something that is passed around from the main arc package to the sub-packages. I think that's a more resilient approach than the current side-effect-y way utils deals with the profile name. Additionally, once we start crossing boundaries between packages, it is hard to reason about when utils executes its current initAWS logic. Finally, once people start integrating different versions of the AWS SDK from the one pinned in arc (which may easily happen with userland plugins), this becomes even more difficult to trace, as we need to ensure the profile name is set properly in every arc sub-package's AWS SDK as we can't guarantee that there is only one version of the AWS SDK floating around.
Started to research how to do this.
@architect/utillsbannersets up the AWS env vars viainitAWS(), we'll want to break it out of there and use a common config object we pass around. We should absolutely still set the env vars but not use them in our own code.initAWS()setups up environment variables but also reads from existing aws.config (via a call torequire(aws-sdk)) and allows existing ENV vars to override. It keeps track of the source of the creds through an env var.- it configures the global
aws-sdkrequire directly.
@architect/deploygets passed cli args only, pulls from@architect/inventoryto get a region (lets the cli override the region)@architect/inventoryrequires theaws-sdkand queries env vars files, and a bunch of other stuff. Seems to need the env vars set from the banner/initAWS but is also called before it?
It's possible that @architect/inventory gives different results in different places in the arc cli because it's being called before and after env vars are setup.
I didn't trace the other arc commands but I think @architect/utills should export a function to load aws configs and set default aws env vars (and frankly not in a function that prints a banner) and we should be passing it around to every other module and not relying on global scope or env vars for internal aws operations. We should also ensure it's called directly and relies on no other code.
I should get someone else to confirm this behavior but
initAWS()
- gets the inventory passed in from
require('@architect/inventory')which needs aws creds to read env vars - probable source of bug - gets a default region passed in from the inventory (
us-west-2) - sets
process.env.AWS_REGION = inv.aws.region
I'm lost on the ordering and logic but the following operations also happen
- sets
process.env.ARC_AWS_CREDS - sets `process.env.AWS_PROFILE
- loads the aws config files
- we check for existing env vars for AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN
- We setup dummy env vars and configs if we don't have good info
Eventually
- we setup the global
require('aws-sdk')with these settings
TLDR
Our bug appears to be caused by initAWS using the inventory before setting up profile information. The inventory connects to aws with default profile and gives the wrong inventory back to the rest of the system.
Yes I agree 100% that the functionality of determining and setting profile info should be broken out from the banner printing 😆
The crusty initAWS/banner/env var logic aside, I think the crux of the bug is a) pinning arc to an aws-sdk version and b) assuming user projects will only ever use that one aws-sdk version. The AWS SDK is constantly updated, as new functionality is released by AWS, the SDK will move forward with versions. We must expect that userland plugins will move in step with that to bring that functionality to arc apps.
I wonder how different this problem and solution can look like if we start from a place where we don't assume a single version of the SDK. I understand why we did that initially (to lock to the version of the SDK that is present inside Lambdas by default) - but this version changes as well (see here). If anything, this specific version of the SDK maybe should be locked in sandbox or in functions (the two packages that either mimic or interact with Lambda runtimes) - but why is it set as a peer dependency in deploy?
Maybe the right way to look at this problem is from the perspective of a user's app. How would a typical arc app's package.json look like with respect to the aws-sdk, and how can arc play nice dependency-wise? What if the various arc packages loosened up the semver requirement on the aws-sdk such that it was e.g. ^2.880.0 instead of the current 2.880.0? I think for my situation, that would resolve this issue, as presumably arc's and the app's aws-sdk dependency could be unified then.
I sometimes have to do silly things with my code. I wouldn't want my deploy tools to
- change my global configuration
- use my dependencies that I may have done silly things to
For example honeycomb and a dozen other apm systems monkeypatch aws-sdk. When you use it in the deploy I don't want it throwing errors because honeycomb isn't set up. (This may come up in the sandbox. Probably not deploy)
I also have a bad habit of tracing bugs by editing my node-modules directory. (This would come up in deploy) I wouldn't want arc to run any of that.
fwiw many versions of the sdk shouldn't be an issue; aws only adds stuff they don't remove stuff. no idea how the new sdk is going to screw w this either but the creds lookup logic is def a nightmare to try and grok/follow. def need to break this logic out. the things that create events shouldn't be the things that print events. ideally we refactor so the 'updater' is really just a dumb printer and the things that do have data to emit should do so as events the updater is subscribed to.
Just want to call attention to this comment in particular, since it probably fits as well here as in the #1357 thread where I first posted it: https://github.com/architect/architect/issues/1357#issuecomment-1192945902
Good news, I think this issue is mitigated by migrating to aws-lite. See: https://github.com/architect/utils/pull/229
This all got a bit mucked up due to a variety of circumstances: legacy code; Architect allowing you to specify a credentials profile outside the normal aws-sdk loading strategy via @aws profile; the way Architect workflows can run both independently and via `@architect/architect); and the fact that some workflows don't need real credentials, but do need dummy creds to operate offline.
This was made worse by the fact that merely requiring/importing aws-sdk would attempt to load credentials and store them in a global state bag. For various scenarios (testing, credential switching, etc.) this made things super complicated, especially when accounting for the above factors. We attempted to get a little control back – usually it worked, sometimes it didn't.
I'm reasonably certain aws-lite (and some related refactoring across various modules) will solve this. There's no weird, magical global credential state. Credentials are loaded into each client instance. If a profile is specified in @aws profile, we'll attempt to use that.
Onward!
Heck yeah!