terraform-cdk icon indicating copy to clipboard operation
terraform-cdk copied to clipboard

Improve performance of cdktf get

Open onitake opened this issue 4 years ago • 11 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

When using cdktf synth as part of a CI/CD pipeline, it is currently required to either run cdktf get on every pipeline run, or cache the imports folder in some way.

Unfortunately, the get command is very slow, due to repacking and recompilation of the Terraform providers. This is especially painful when using one of the "secondary" languages, such as Python. I experienced pipeline runs of over 5 minutes with only the AWS provider loaded.

The second option (caching) is non-trivial, because the build environment might not be identical and there doesn't seem to be an efficient way to check for updated providers at this point.

I also tried checking the imports folder into the build repository, but this failed because the generated code seems to platform dependent. The contents were different between cdktf get runs on Windows and Linux machines.

It would be much better if get could instead download precompiled providers from a repository, similar to how Terraform downloads its providers from a Hashicorp (or local) file server.

References

There is an open request for an efficient update command: #578

If this gets implemented, the imports folder could be cached by the CI/CD pipeline, and a quick update check could be done before running the synth command. I'm thinking about storing the the imports folder as a build artifact, which is loaded into the pipeline before it gets executed.

onitake avatar Jun 23 '21 10:06 onitake

If you are just using the aws provider (or a few others), you could use pre-built provider packages hosted on PyPi (or similar for other languages).

The contents were different between cdktf get runs on Windows and Linux machines.

Are there more differences than just line endings? Likely something that could be improved if so.

jsteinich avatar Jun 23 '21 13:06 jsteinich

If you are just using the aws provider (or a few others), you could use pre-built provider packages hosted on PyPi (or similar for other languages).

Ok, very cool!

Would it be enough to just add a dependency to the Pipfile, or is there something that cdktf needs as well?

The contents were different between cdktf get runs on Windows and Linux machines.

Are there more differences than just line endings? Likely something that could be improved if so.

As a matter of fact, there is:

[Container] 2021/06/23 09:58:56 Running command cdktf synth
cdktf encountered an error while synthesizing

Synth command: pipenv run python main.py
Error:         non-zero exit code 1

Command output on stderr:

    jsii.errors.JavaScriptError: 
      internal/modules/cjs/loader.js:303
            throw err;
            ^
  
      Error: Cannot find module '/tmp/jsii-kernel-7PD2Kw/node_modules/hashicorp_aws/providers\aws\index.js'. Please verify that the package.json has a valid "main" entry
          at tryPackage (internal/modules/cjs/loader.js:295:19)
          at Function.Module._findPath (internal/modules/cjs/loader.js:508:18)
          at Function.Module._resolveFilename (internal/modules/cjs/loader.js:802:27)
          at Module._load (internal/modules/cjs/loader.js:667:27)
          at nodeRequire (/tmp/tmplj38tl2q/lib/program.js:7809:42)
          at /tmp/jsii-kernel-7PD2Kw/node_modules/hashicorp_aws:1:1
          at Script.runInContext (vm.js:130:18)
          at Kernel._execute (/tmp/tmplj38tl2q/lib/program.js:8588:31)
          at Kernel.load (/tmp/tmplj38tl2q/lib/program.js:7856:38)
          at KernelHost.processRequest (/tmp/tmplj38tl2q/lib/program.js:9479:36)

The imports were created on a Windows machine, and the synth command was executed on AWS CodeBuild in a Ubuntu build container. Looks like the generated code contains some Windows path separators that don't work on Linux...

onitake avatar Jun 23 '21 13:06 onitake

Would it be enough to just add a dependency to the Pipfile, or is there something that cdktf needs as well?

Yep

jsteinich avatar Jun 23 '21 14:06 jsteinich

Hmm, that doesn't seem to be enough for cdktf:

[Container] 2021/06/23 14:35:58 Running command cdktf synth
ERROR: synthesis failed, run "cdktf get" to generate providers in imports

If I execute the Python script directly instead:

[Container] 2021/06/23 14:41:42 Running command pipenv run ./main.py
Traceback (most recent call last):
  File "/...../main.py", line 4, in <module>
    from imports.aws import AwsProvider
ModuleNotFoundError: No module named 'imports'

This happens because the Python script references the generated imports folder. I can address this by replacing from imports.aws import AwsProvider, ... with from cdktf_cdktf_provider_aws import AwsProvider, ... - looks a bit awkward, but I suppose it's intended that way?

onitake avatar Jun 23 '21 15:06 onitake

I can confirm that this works as expected.

Added benefit: I won't even need to install cdktf-cli into the build container any more. :smiley:

onitake avatar Jun 23 '21 18:06 onitake

Added benefit: I won't even need to install cdktf-cli into the build container any more. 😃

Running without the cli isn't exactly the same as running your app directly. Currently this only impacts Terraform Cloud usage and some resource naming, but there may be further differences in the future.

jsteinich avatar Jun 23 '21 21:06 jsteinich

Ok... that does make sense. It would be great have a full overview of the differences somewhere, though. Which resources would have naming issues?

In any case, I don't think I can use cdktf-cli in a deployment pipeline as it is now. The cdktf get requirement and the time it consumes on every run is just a bit too much.

The roadmap for cdktf is obviously at Hashicorp's discretion, but if I had a say, I wouldn't mind not having a dependency on an additional CLI tool. The most important bits are in the CDK and the providers, and the deployment is handed off to Terraform anyway. cdktf-cli is useful to help create new projects, determine dependencies and generate the language-specific CDKs and providers, but if it doesn't provide much additional value during deployment runs, why not put the added logic into the CDK or Terraform itself instead?

onitake avatar Jun 24 '21 08:06 onitake

It would be great have a full overview of the differences somewhere, though

Agreed. https://github.com/hashicorp/terraform-cdk/issues/798

Which resources would have naming issues?

Resources that are direct children of the stack and those with names containing hyphens or underscores.

The cdktf get requirement and the time it consumes on every run is just a bit too much.

The differences I mentioned are all related to cdktf synth. cdktf get doesn't need to be used.

why not put the added logic into the CDK or Terraform itself instead?

There are cases where that makes sense, but likely also times where restrictions within the library make that difficult.

jsteinich avatar Jun 24 '21 21:06 jsteinich

It would be great have a full overview of the differences somewhere, though

Agreed. #798

Thank you!

I'll comment there if I encounter any particular issues.

Which resources would have naming issues?

Resources that are direct children of the stack and those with names containing hyphens or underscores.

:thinking: That means I'll probably have to run a few diffs and compare the output.

The cdktf get requirement and the time it consumes on every run is just a bit too much.

The differences I mentioned are all related to cdktf synth. cdktf get doesn't need to be used.

Yes, ok. But as I mentioned above, synth refuses to work without doing a get, or installing the imports some other way. It's not enough to only create a pipenv with the Python dependencies.

ERROR: synthesis failed, run "cdktf get" to generate providers in imports

onitake avatar Jun 25 '21 08:06 onitake

Yes, ok. But as I mentioned above, synth refuses to work without doing a get, or installing the imports some other way. It's not enough to only create a pipenv with the Python dependencies.

ERROR: synthesis failed, run "cdktf get" to generate providers in imports

Could you post your Pipfile and cdktf.json here? Curious to see what might cause this.

skorfmann avatar Jun 30 '21 08:06 skorfmann

Sure!

Pipfile:

[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[requires]

[dev-packages]

[packages]
cdktf = "~=0.4.1"
boto3 = "~=1.17.97"
cdktf-cdktf-provider-aws = "~=1.0.103"

cdktf.json:

{
  "language": "python",
  "app": "pipenv run python main.py",
  "terraformProviders": ["hashicorp/aws@~> 3.26"],
  "terraformModules": [],
  "codeMakerOutput": "imports",
  "context": {
    "excludeStackIdFromLogicalIds": "true",
    "allowSepCharsInLogicalIds": "true"
  }
}

onitake avatar Jul 01 '21 14:07 onitake

Hi 👋 Since this issue was posted, we have made pre-built providers available: https://cdk.tf/prebuilt-providers

We also improved the performance of cdktf get in separate (recent) releases which is why I'll close this old issue. That said, if you encounter any new performance related issues, don't hesitate to create a new issue!

ansgarm avatar Oct 06 '22 10:10 ansgarm

I'm going to lock this issue because it has been closed for 30 days. This helps our maintainers find and focus on the active issues. If you've found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Nov 24 '22 01:11 github-actions[bot]