powertools-lambda-typescript Maintenance: improve continuous code quality automation

Maintenance: improve continuous code quality automation

Open dreamorosi opened this issue 6 months ago • 4 comments

Summary

Currently we are running only limited code quality checks before merging a PR (linting and unit tests), while integration tests are still ran manually by maintainer either as a part of a PR review or before a release.

This is error prone and increases the risk of issues slipping into the codebase.

This issue came out of a discussion under #2072.

Why is this needed?

So that we can have code quality checks on the code merged on main and increase confidence on what's in it.

Which area does this relate to?

Automation, Tests

Solution

We should create a new workflow called "Code Quality" with trigger to be defined with several steps:

reusable "Run unit tests" workflow (to be renamed and/or split to reflect that it runs also linting)
integration/e2e tests
in the future fuzzying

The linting and unit tests complete within ~2 minutes while the integration tests take 8-9 minutes to complete.

We are already parallelizing via a matrix of architectures, runtime version, and packages. This results in 35+ tests and (~80+) CloudFormation stacks being deployed which in some cases seems to trigger some queuing in CloudFormation.

Regarding the trigger, I think we have four options:

Run the code quality periodically - i.e. every few hours or every day. Given that we release at best bi-weekly this type of frequency would be sufficient for us to catch issues and would allow us to merge things quickly. At the same time, doing this would increase the risk of compounding issues if we were to branch out or rebase another feature branch from a main with issues.
Run the code quality after code is pushed into main as a result of a merged PR. Depending on how we set up the CI this would mean a cooldown of ~10 minutes between merged PRs. On the other hand this would ensure that issues are caught as early as possible.
Run the code quality as PR check. Today we are already running the unit tests in this way but while this method would give us high confidence it would also be wasteful and introduce latency since checks are run on every PR update.
Run the code quality right before code is merged on main using merge queues. This would provide the similar benefits to the previous two options, but with two added values: 1/ maintainers don't have to wait for the longer checks to finish before merging one or more PRs, 2/ issues are detected before they are merged to main, thus giving the maintainer an opportunity to fix them without having to create a dedicated issue & additional PR.

At least based on my current understanding of merge queues option 4 seems the most promising option. With that said, this is a relatively new feature and I don't think any of us has had the chance to experiment with it yet.

Acknowledgment

[X] This request meets Powertools for AWS Lambda (TypeScript) Tenets
[ ] Should this be considered in other Powertools for AWS Lambda languages? i.e. Python, Java, and .NET

Future readers

Please react with 👍 and your use case to help us understand customer demand.

Feb 21 '24 20:02 dreamorosi

@aws-powertools/lambda & @aws-powertools/lambda-typescript-core would like to hear your opinion before settling on an option.

Feb 21 '24 20:02 dreamorosi

I agree with running e2e tests as soon and as frequent as possible. I would opt for an approach that would run those tests before merging to main, but the time it takes for the tests to run will create a slow developer feedback and running after merge will catch issues late in the developer cycle.

I think the 4th option with merge queues might be a good approach (I have not used it before). if merge queues can run the e2e tests once on all those queued pull requests that is the way to go. From what I could see merge queues have some tradeoffs specially in terms of speed, because it might take some time to collect the branches to perform the merge queue, best case scenario one pull request will take same time, worst case more pull requests will slow things down a bit, because it has to collect all pull requests.

Another added benefit I get from the docs is "... does not require a pull request author to update their pull request branch and wait for status checks to finish before trying to merge." which is also really good.

Feb 22 '24 11:02 hjgraca

Security was the main concern for running e2e on a PR, thus we did not enable it in the past. Reading merge queues it seems exactly the right tool to solve it. Alternatively, we could merge first into staging branch and run separate checks, which seems to be exactly how merge queues work.

Another topic is how we deal with broken e2e test in a PR from external contributors. We can't expect them to fix the tests without local iteration options (they'd need an AWS account). This would mean we need to take over the PR at the end and make changes. Not sure if there are other options.

Feb 26 '24 16:02 am29d

Over in @aws-powertools/lambda-java-core we are running E2E tests on every PR for 4 different runtimes. We also run on main, once the PR is merged, which is somewhat redundant. That is - we are using (2) and (3) from your options. PR runs are gated so they only run for PRs cut from branches on the primary repository itself and not on PRs from forks, where we have to actively trigger the run after considering the changeset.

We have a fairly large E2E suite and it is at times flaky - often when we catch failures, the failures are incidental (e.g., some timeout the suite hasn't run into before) and not related to the underlying PR at hand. We invest effort in improving the robustness of the E2E suite, but accept that this will inherently always have some element of "randomness" that actual unit tests should lack.

Having said all this I think this is clearly better than not having the suite. When we catch errors, they are errors we would not have otherwise noticed, and are errors that should block a merge.

We've not used merge queues either and I am interested in seeing how you go with them!

Feb 26 '24 20:02 scottgerring

powertools-lambda-typescript powertools-lambda-typescript copied to clipboard

Maintenance: improve continuous code quality automation

Summary

Why is this needed?

Which area does this relate to?

Solution

Acknowledgment

Future readers

powertools-lambda-typescript
powertools-lambda-typescript copied to clipboard