Zappa Provide configuration for reserved and provisioned concurrency

Provide configuration for reserved and provisioned concurrency

Open monkut opened this issue 1 year ago • 1 comments

Lambda now provides a reserved and provisioned concurrency setting/configuration.

https://docs.aws.amazon.com/lambda/latest/dg/lambda-concurrency.html#reserved-and-provisioned

https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html

zappa currently provides a keep_warm function to periodically keep a single lambda instance "warm". However, the current keep_warm method is not expected to reduce cold starts for more than a single instance.

This issue requests that reserved and provisioned concurrency are properly integrated with zappa. Potentially consider depreciating keep_warm in favor of these new concurrency settings.

Aug 17 '23 05:08 monkut

Hey, just wanted to provide some quick thoughts here! (Note: Ok, now having finished with writing what follows, it seems my thoughts turned out to be not-so-quick 🙃 - sorry, I'll provide a quick TL;DR.)

TL;DR

Unless new features are critical (e.g., supporting a new Python version), we should probably not add them to Zappa right now, as I'm in the process of replacing Zappa's ancient self-management of infra with the ability to interface with modern Infrastructure-as-Code (IaC) platforms that will enable all of the current features of Zappa, plus allow the use of pretty much every AWS feature that anyone could ever want.

Regarding the two features specifically mentioned, as far as adding either to Zappa:

Reserved concurrency: I don't see much demand for it, but wouldn't be opposed to someone submitting a high-quality PR if they really wanted it configurable in Zappa before the IaC transition is complete. Although, as this is a feature that can already be enabled on any Lambda function in ~30 seconds using the AWS console, I'm not sure if the effort would be worth it.
Provisioned concurrency: I strongly feel this belongs to the class of features that Zappa should never expose in its config, due to the fact that this is a setting that, if it is inadvertently/unknowingly enabled by a user, can easily result in a large, unexpected AWS bill. Zappa has always been a tool that can be messed around with by a novice trying out deployments with different settings without having to worry about receiving a large unexpected AWS bill as a result, and it shouldn't deviate from that expectation now, after 8 years. Add that to the fact that there are other deep-rooted problems that an implementation of this feature would run up against in Zappa, and I can't see this being a feature Zappa will ever directly support.

Of course, using IaC, knowledgeable users can enable either of these features (and countless more), but doing so via IaC requires far more intentionality than simply playing around with the settings found in a single JSON file (zappa_settings.json).

Background

For context, here are a few of the premises of Zappa (all IMHO, of course): what it is (and is not) and the scope of the role it should aim to play within the broader context of the overall AWS ecosystem and related tools:

Zappa's primary purpose/function/goal/objective is (and should continue to be) to enable fully functional, serverless deployments, via AWS Lambda, of APIs running on common Python web frameworks. In doing so, we should endeavor to make that process as simple as possible for as many users as possible (from students/beginners to GvR himself, and from pre-seed startups to the largest multinational corporations). This means supporting both Zappa's ability to create any required infra as necessary, while also allowing for existing infra that has been configured though other means, such as Infrastructure-as-Code (IaC) platforms, to be used if desired. Given the proliferation and maturation of IaC platforms since Zappa was first released 8 years ago, our focus on enabling/supporting/encouraging their use for infra configuration should be significantly increased compared to where Zappa's focus was 8 years ago, which was more on increasing the amount of infra configuration that Zappa could automate itself directly via the AWS API.
Zappa was never intended (nor should it intend) to enable complete, automated configuration of every potential infra feature that AWS offers (including all of the features supported by Lambda, although they generally deserve some consideration). IaC platforms are much better suited to do that job, and we should not try to make Zappa into another IaC platform. If I were to rewrite Zappa from scratch today, I'd probably completely remove any direct configuration of AWS infra all together, and instead focus on supporting common IaC platforms that could be used to configure the necessary infra for Zappa. Default templates for the supported IaC platforms could be included (and/or automatically generated) that would enable the relevant IaC platform to configure the "default Zappa" infra that Zappa currently configures itself directly via the AWS API. These templates could be used as-is, or they could be used as a starting point that users/orgs could then customize to meet their specific needs.
- I think that is generally the primary goal we should be striving for in 2024, as offloading Zappa's infra config to IaC platforms will render most "can Zappa add support for x?" questions moot, limit the scope (Zappa has continually been suffering from scope creep ever since it was originally released) of what Zappa needs to do/support, and allow for developer focus to shift toward ensuring that Zappa is able to fulfill its primary purpose via a consistent, fast, high-quality, and bug-free experience.
Therefore, given the current state of Zappa, I do not believe we should automatically consider the addition of support for additional AWS infra configuration to be inherently good/positive/beneficial or the right choice (especially if that support would increase our reliance on interfacing directly with the AWS API to configure infra). In fact, for the time being, I would argue that our default should be to assume that such additions should not be made, unless there are extremely compelling arguments in favor of implementing immediate support for such an addition. In summary, I am proposing that, as a general rule, until Zappa can accomplish its primary function without needing to use the AWS API directly, we should err toward focusing on furthering our support/integration with common IaC platforms that already support extensive AWS infra configuration, rather than attempting to support the ever-growing list of AWS infra configuration options that Zappa, in its current state, would need to manually configure itself via the AWS API. Doing so will only serve to entrench Zappa further in its current no-man's-land state of functioning partly as a pseudo-IaC tool and delay how quickly we can extricate ourselves from that undesirable state.
In addition, we also need to consider if/how any proposed new feature would add to the complexity involved of getting Zappa to perform its primary function. Complexity is something that, by default, we should strive to minimize, given our goal of "mak[ing] that process as simple as possible". Similarly, we should be considering if/how any proposed new feature might increase the amount of AWS expertise/knowledge our users will need to have, which is a bar that, by default, we should be hesitant to raise, given our goal of making Zappa useful/usable "for as many users as possible (from students/beginners to ...".
It's also worth keeping in mind the small size of the team that remains active in Zappa's maintenance (especially when viewed in comparison to the size of Zappa's user-base) and the fact that the amount of time that each of us is choosing to volunteer from their limited free time (outside of our personal, professional, and/or other OSS responsibilities, etc.) toward the maintenance of Zappa is acutely finite.
- And, given that the addition of any new functionality tends to increase the maintenance burden (the possibility of introducing bugs, the need for documentation, the possibility of users requesting support in regards to the functionality, etc.) of a project, which is another reason why, with a few exceptions, I believe the addition of most new features likely should not be prioritized (until, as mentioned above, we eliminate the need for Zappa to function as a quasi-IaC platform). However, here are examples of some of the things that I think should continue to be prioritized as necessary:
  - Bug/security fixes: almost certainly most P0/P1-level (critical/high) bugs, P2 (medium) on a case-by-case basis, P3 (low) not likely
  - Adding support for more users, which will largely involve adding/improving support for things like: new architectures, versions of Python, dependencies, integrations with additional platforms/tools/services (beyond the obvious IaC) where useful/appropriate, Docker/containerization, OSes, etc.
  - Continuing to purge the large amount of ~8-year-old dev-facing tech debt, which I also consider to be among the highest areas of importance, since, absent a way to increase the number of hours in a day, increasing the efficiency of our team and other contributors (and lowering the barrier of entry for new contributors) is likely the best way we can accomplish more of the things we'd like to achieve (from bug fixes to IaC support to feature requests) within finite time constraints. This includes things like:
    - Adding/updating/replacing dev tools/dependencies
    - Shedding our Py2 roots more fully and bringing our repo up to the standards of a modern, Py3.8+ codebase
    - Improving/updating/speeding up our tests (both locally and via CI/CD)
    - Updating/optimizing/fixing our CI/CD pipeline
    - Making use of the vast array of automated tools that now exist that can accomplish some of our repo's basic maintenance tasks that often fail to get done in a timely manner (if at all) currently

Discussion

Whew 😅. Ok, with that context in mind, let's discuss the features that were brought up.

Introduction

"Reserved concurrency" and "provisioned concurrency" are actually quite different features, targeting different use cases (with potentially drastically different cost implications) so I think each should be discussed separately. To that end:

Reserved Concurrency

Description

Limits how many simultaneous invocations there may be of a given Lambda function. For example, most accounts start out with a total limit of 1000 simultaneous Lambda invocations across all functions (note: this is a limit that can easily be raised by asking AWS support - I've seen it be set well into the millions at large, corporate entities - AWS just prefers to start accounts with relatively low limits so an inexperienced user/org doesn't accidentally run up a $100k+ bill due to a bug that is launching way more Lambda invocations than expected), and, assuming that an AWS account has that default level of 1000, then setting the "reserved concurrency" on function x means that function x will be able to have up to 400 simultaneous Lambda invocations, but cannot exceed 400, even if there is demand for more. What it also means is that all other functions would now only have a pool of 600 "unreserved" Lambda invocations they can simultaneously share, as the 400 "reserved" for function x cannot be used by any other function.

What that means for Zappa

One thing I think is clear here is that "reserved concurrency" does not attempt to achieve the same goals as Zappa's "Keep Warm" functionality and should/can not be considered as a potential replacement for it.
Therefore, I think "reserved concurrency" would need to be considered as a potential addition to the features Zappa enables, rather than a replacement.

Arguments against/for it being added to Zappa's config:

Bad cop/Devil's advocate/cons:
- Everything mentioned above in Background. TL;DR: it's 2024; rather than continuing to function as a quasi-IaC solution that slowly adds (or doesn't add) AWS infra features in piecemeal to a zappa_settings.json that has continually become more of a confusing mess as scope creep has continued for nearly a decade, Zappa should focus on increasing support for the various IaC platforms which exist specifically to allow for detailed configuration of AWS infra to be defined, which would offer the ability to configure this setting (and countless more) immediately, without the need for Zappa to develop and maintain its own manual implementation for configuring the setting by directly interacting with the AWS API.
- I have not seen many real-world uses of "reserved concurrency" in the wild (nor can I recall seeing an Issue asking for it, besides this one).
- I do have some level of concern that, if this was added to zappa_settings.json, users may not fully understand what it does (and/or get it mixed up with "provisioned concurrency"), and enabling it in one of the most common use cases (i.e., a user/org is deploying their web app's backend API via Zappa, and only has a handful of other Lambda functions - if any at all - that are typically going to be comparatively very minor/not user-facing/etc. in terms of importance/priority/etc. when compared to the API) would, if anything, just increase the likelihood of causing performance degradation (because now their main Lambda function for their API would no longer be able to use all 1000 available account-wide simultaneous Lambda invocations - it would be limited to whatever was set for "reserved concurrency").
  - So, forcing users who actually need "reserved concurrency" to take a minute to go and enable it within the AWS console (where it gives puts a detailed breakdown of what "reserved concurrency" actually does in users' faces - so they are more likely to read it - and is dead-simple to enable/configure) or via an IaC platform (which are typically much more extensively documented than Zappa) may provide some benefit.
Good cop/<whatever the opposite of Devil's advocate is>/pros:
- As far as Lambda features go, "reserved concurrency" is rather innocuous. At worst, it could cause performance issues that can be solved by removing it.

Summary

If the community actually does indicate that they have a strong desire for the ability to enable "reserved concurrency" via Zappa's config, and someone submitted a high-quality PR with complete test coverage that didn't require an extensive review process, I suppose I wouldn't be opposed to merging it, as full integration with IaC platforms will likely take a while. But, this isn't something I would personally commit to adding support for, as I don't think it's an important enough feature to warrant pausing my work toward replacing Zappa's outdated manual infra configuration system with the ability to use various IaC platforms to do the infra configuration for Zappa.

Provisioned Concurrency

Description

Keeps a defined amount of Lambda function instances active/"spun up"/warm, regardless of activity. For example, if the "provisioned concurrency" of function x is set to 100, then 100 instances of function x will run 24/7 indefinitely, even if function x is not invoked for days/weeks/months/years. If function x then receives 500 simultaneous invocations, 100 of those invocations will be able to take advantage of a "warm start", while the other 400 will have to "cold start". After ~5 minutes after the completion of function x's invocation, if function x does not again receive more than 100 simultaneous invocations, the 400 instances that were spun up in addition to the 100 kept active by "provisioned concurrency" will get spun down again. Users of "provisioned concurrency" not only have to pay for the Lambda runtime, but they also have to pay an additional fee is levied by AWS for the use of the "provisioned concurrency" feature. In other words, the cost/min of running a Lambda function instance via "provisioned concurrency" is greater than the cost/min of running a Lambda function instance that is created normally, even if it is then "kept warm" by other means (discussed below).

What that means for Zappa

Note that Zappa configures API Gateway to target the special-case $LATEST alias of the Lambda function (which always targets the most recent version of the Lambda function), an easy way to ensure that the latest version of the deployed API is always the one served to users. However, the special-case $LATEST alias cannot be targeted by "provisioned concurrency" (a long-standing limitation imposed by AWS, although it seems like it should be simple enough thing to enable with the pool of engineering talent they have available 🤷‍♂️).
- This limitation can already be worked around very easily with Zappa as-is (this info is probably relevant to #1289):
  1. Deploy Lambda function with Zappa (if not done already)
  2. Create a custom alias (ex. $LIVE) targeting the latest version number (an auto-incrementing integer) of the Zappa-deployed Lambda function. While the special-case $LATEST alias cannot be the target of "provisioned concurrency", any custom alias that is user-created can be the target. Set this alias to be the target of "provisioned concurrency" and make sure to change API Gateway (if being used) to also target your custom alias rather than $LATEST.
  3. Now, in your CI/CD pipeline, after zappa update is run, all you have to do is use the AWS API to fetch the new version number of the function, and then use the AWS API to change the custom alias that "provisioned concurrency" is targeting to refer to the new version number. This can be done using the AWS CLI, via a Python script using boto, etc.
- However, if "provisioned concurrency" were offered as a setting to Zappa users, they'd likely expect it to work automatically, which would mean either:
  - Making a fundamental breaking change to the practice of using the special case $LATEST alias (which Zappa has done since day 1), which would require Zappa to several additional API requests (slowing deploy time) for each deploy/update, even though only users using "provisioned concurrency" would benefit. Zappa would at least need to add further logic to handle functions deployed with older versions of Zappa that target $LATEST in order to convert them to the new practice of using a custom alias. There are probably other ramifications that would manifest as well.
  - Creating and maintaining a divergent deploy/update code path for users of "provisioned concurrency", which is typically considered a coding anti-pattern. And still, in this case, logic would be needed to convert the targeted alias if a user toggled "provisioned concurrency" on/off, with the same risk of causing downstream issues by doing so.
Enabling Zappa's "Keep Warm" functionality is more-or-less equivalent to having a "provisioned concurrency" of 1 set on the Lambda function, without needing to pay the additional "provisioned concurrency" premium.
Therefore, I think "provisioned concurrency" could either be considered as a potential addition to the features Zappa enables or a replacement for "Keep Warm". If I had to choose between the two, I'd strongly prefer the former, since it would allow for a "cost-effective" (and sane) option to remain (more on that later).
Another option that, IMO, would be significantly more viable to actually implement, would be to consider allowing Zappa's "Keep Warm" to be more configurable. There are many other projects that already implement a configurable "Keep Warm" (example). For users that need to keep many instances warm, this would allow users to configure "Keep Warm" to apply to many instances, functioning very similarly to "provisioned concurrency", but without having to pay the "provisioned concurrency" premium (and without the implementation complexity mentioned above). However, is it worth it for Zappa to take the time to implement this level of configuration to our "Keep Warm" functionality when users could simply use an existing project that has already implemented that level of configuration, such as the example above, and set it to target the Lambda deployed by Zappa? Could a simple link in the README provide access to the benefit with none of the work?

Arguments against/for it being added to Zappa's config:

Bad cop/Devil's advocate/cons:
- Everything mentioned above in Background. TL;DR: it's 2024; rather than continuing to function as a quasi-IaC solution that slowly adds (or doesn't add) AWS infra features in piecemeal to a zappa_settings.json that has continually become more of a confusing mess as scope creep has continued for nearly a decade, Zappa should focus on increasing support for the various IaC platforms which exist specifically to allow for detailed configuration of AWS infra to be defined, which would offer the ability to configure this setting (and countless more) immediately, without the need for Zappa to develop and maintain its own manual implementation for configuring the setting by directly interacting with the AWS API.
- The issues mentioned above concerning the specific implementation problems that adding support for "provisioned concurrency" would present.
- As mentioned above, the fact that "Keep Warm" can be configured to work very similarly to "provisioned concurrency" (either via an existing 3rd party project or by modifying our own implementation/config), without needing to pay AWS their "provisioned concurrency" premium.
- Here is perhaps my most major concern about allowing "provisioned concurrency" (and/or a "Keep Warm" implementation that could function similarly) to be configured in zappa_settings.json. Zappa has always been something that students/beginners can play around with and test without needing to worry much about accidentally racking up a huge AWS bill (a Lambda function doesn't cost anything until it gets invoked). Adding an easy way to enable "provisioned concurrency" in zappa_settings.json would fundamentally remove that implicit safety that users have come to expect over the past 8 years. A student is playing around with Zappa, deploys a dozen or so Lambda functions, forgets about them. Today, nothing happens. If we had a provisioned_concurrency setting that he set to 1000, a month later, AWS sends him a $50k bill. Since part of Zappa's mission is to provide its functionality to as many users as possible, including students/beginners, I am extremely hesitant to add any configuration setting that could easily be inadvertently/unknowingly enabled and result in users unwittingly being charged major sums of money by AWS.
  - A configurable "Keep Warm" presents the same risks, although, all things equal, it couldn't generate as large a bill because it wouldn't result in the "provisioned concurrency" premium being charged. However, it would still enable risk of major costs. I would much rather have a user make a conscious choice to use another project which solely implements "Keep Warm" and dedicates its documentation to explaining what it is, how it works, the risks of its usage, etc., because any warning about the risks of a configurable "Keep Warm" implementation in Zappa would end up being "just another line item" buried in our README that wouldn't get read.
- This is small potatoes compared to all of the problems above, but additionally, "provisioned concurrency" is basically unnecessary in 99.99% of cases nowadays in 2024. Speeding up cold starts has long been one of the Lambda AWS team's biggest priorities. They have gotten to be really, really fast by now. This article, now 3 years old, showed that 2/3rds of Python cold starts took ~250ms or less, while 95% took ~500ms or less. They've only gotten faster since then. I had an engineer question if Lambda cold starts were causing a performance issue less than a year ago, and I told him that it had nothing to do with Lambda cold starts, but he was welcome to go look in CloudWatch if he wanted proof. So he did, and, surprise, it wasn't the cold starts - the Python runtime was consistently instantiated by Lambda in ~75-125ms.
  - When cold starts are noticeably "slow", it's almost invariably due to the deployment package (i.e., function, application) itself and/or how it is configured/deployed, not Lambda. For example, using Zappa's slim_handler setting will kill cold start time, because the application has to be downloaded from S3, extracted, installed, configured, and initialized on every cold start. The other major mistakes I've seen made are apps that do too much in their lambda_handler function/module (hint: more than the bare minimum is usually too much - try to do things after your app has been instantiated) and/or apps that have to make many requests outside of Lambda to get everything they need to initialize. For example, users will store keys in KMS, secrets in Secrets Manager, assets in S3, need data from RDS (or, even worse, need to make requests to external APIs) in order to get all of the settings their application needs to initialize. All of those requests will slow cold starts significantly. All settings needed to initialize an app should be kept in the Lambda functions' environment variables. Getting anything else (like from S3, RDS, or a 3rd party API) should be done after your app has been instantiated.
  - Zappa also offers an optional setting that can help speed up application cold start times significantly (it says "experimental", but it's been in use for over 2 years by now, so I think that tag can/should be removed at this point). IMO, if there's any action that should be taken here, it would be enabling that setting by default.
  - It's also important to note that Zappa creates Lambda functions with 512MB of RAM by default, which not only isn't very much memory these days, but, since Lambda scales vCPU power with the amount of RAM provisioned, it means that the default Lambda packages created by Zappa only have 0.5 vCPU. Increasing this to at least 4096MB (4 vCPU) or higher can often greatly speed up cold start times (and general performance), especially for large, monolithic applications (and/or applications with many large dependencies).
  - Zappa also supports AWS X-Ray, which can be very useful in pinpointing what part(s) of your application's cold start process are causing slowdowns.
  - When all else fails, creating a custom Docker image with all dependencies and the user's application preinstalled is usually a sure-fire way to guarantee consistent, relatively fast cold start times (Lambda's cold start times for Docker are still currently slower than most of its native runtimes, but Docker cold starts have been catching up, and, given how relatively "young" Docker image deployments are on Lambda, I expect there is still significant headroom for improvement), and Docker makes it much harder for users to create a deployment package that results in a very slow application cold start. (Zappa already supports deploying Docker images, but perhaps we should push them more prominently as a potential solution to slow cold starts and a better/faster alternative to using slim_handler.)
  - If someone is still convinced they need "provisioned concurrency" at this point, they should probably use one of the advanced "Keep Warm" projects to get nearly identical performance at a lower cost.
  - After trying that, if they are still convinced that things aren't fast enough (which, at this point, means that differences of milliseconds are significant to them), then Lambda (and the Python language itself, tbh, if they are using Zappa) probably isn't the right tool for their use case. They probably want to use something like a cluster of EC2 instances behind an ALB (potentially with autoscaling enabled via Fargate or similar if they went with Lambda originally for its ability to horizontally scale to handle traffic bursts) and use an entirely asynchronous web framework (ideally in a much faster language than Python that has true first-class support for async/multi-threading), etc.
  - The ~0.01% of times it might still make some sense in 2024 to use "provisioned concurrency" is when it's a temporary measure to put off having to deal with tech debt (which would typically entail either optimizing their application's cold start times or moving off of Lambda). This might make sense for a fast-growing startup that would rather focus on adding new features to attract customers/investors than working on their infrastructure. It also might make sense for a slow-moving large corporation, since the added expense from "provisioned concurrency" would basically be a rounding error on the balance sheet of a mega corp, and because they know it could take them a long time to cut through the red tape to get the necessary changes made to their infrastructure.
Good cop/<whatever the opposite of Devil's advocate is>/pros:
- Um... I guess it gives devs/orgs the option to pay money to avoid needing to write better application code or move to a more appropriate platform for their use case? Still, an advanced "Keep Warm" function could do the same for less money.

Summary

I really don't think adding "provisioned concurrency" support to Zappa is a good idea. The fact that it would break the implicit contract Zappa has had with its users since its inception that "you can (safely) use Zappa to mess around with deploying your apps to Lambda" by adding a setting that could deploy a Lambda function that would immediately begin accruing charges on their AWS bill is a complete non-starter by itself, IMO, and that's far from the only concern it would bring.

Jan 12 '24 00:01 javulticat

Hi there! Unfortunately, this Issue has not seen any activity for at least 90 days. If the Issue is still relevant to the latest version of Zappa, please comment within the next 10 days if you wish to keep it open. Otherwise, it will be automatically closed.

Apr 11 '24 03:04 github-actions[bot]

Hi there! Unfortunately, this Issue was automatically closed as it had not seen any activity in at least 100 days. If the Issue is still relevant to the latest version of Zappa, please open a new Issue.

Apr 21 '24 03:04 github-actions[bot]

Zappa Zappa copied to clipboard

Provide configuration for reserved and provisioned concurrency

TL;DR

Background

Discussion

Introduction

Reserved Concurrency

Description

What that means for Zappa

Arguments against/for it being added to Zappa's config:

Summary

Provisioned Concurrency

Description

What that means for Zappa

Arguments against/for it being added to Zappa's config:

Summary

Zappa
Zappa copied to clipboard