terraform-cdk
terraform-cdk copied to clipboard
feat(provider-generator): namespace every resource / data source / provider
This allows reducing the compile time across all languages since users only need to import the used subset in the provider bindings instead of the entire one. Namespaces force JSII to split the generated code up into multiple files that can be individually parsed and used.
I also put the provider in a namespaced and renamed the file from e.g. aws-provider.ts to provider.ts so that import paths are a bit nicer and namespaces are aws.provider.AwsProvider vs aws.awsProvider.AwsProvider.
BREAKING CHANGE: This changes the generated API surface a lot, if you use an already namespaced provider (read: AWS) you only need to import the provider declaration differently. For any other provider the imports are changing so that users would need to change the imports for their resources / data sources to use namespaces or barrell imports.
TODOs
- [x] Adjust convert to work with the namespaces
- [x] Make Unit Tests pass
- [x] hcl2cdk: data local_file
- [x] hcl2cdk: complex for each loops
- [x] hcl2cdk: data references
- [x] hcl2cdk: simple data source
- [x] hcl2cdk: null provider
- [x] hcl2cdk: convert project (succeeds but synth fails)
- [ ] Fix Integration Tests
- [ ] Fix Examples
- [ ] Document breaking change and migration guide
- [ ] Update all the docs
- [ ] Documentation: add "Use specific / namespaced imports" in best practice section
- [ ] Add doc string for each of the namespaces with a link to the TF docs
This is a tough one. I certainly understand the motivation for doing this; however, the usage is definitely less clean.
I measured through a subset of the examples to give us an idea of the improvements:
| example | get before | get after | get delta | synth before | synth after | synth delta |
|---|---|---|---|---|---|---|
| Python AWS (namespaced) | 137s | 113s | -17% | 27s | 22s | -18% |
| Python Azure | 35s | 34s | -2% | 10s | 8s | -20% |
| C# AWS (namespaced) | 80s | 79s | -1% | 35s | 16s | -54% |
| C# Azure | 66s | 70s | +6% | 35s | 14s | -60% |
| Go AWS (namespaced) | 341s | 322s | -5% | 20s | 15s | -25% |
| Go Azure | 247s | 247s | 0% | 22s | 9s | -59% |
This is a tough one. I certainly understand the motivation for doing this; however, the usage is definitely less clean.
Yeah I agree, it's a trade-off. We will need to explore which APIs get worse / better and how much it impacts the compilation speed. In an ideal world I'd like to have the current API surface (and more) and fast synth / get times at the same time, but this seems to be impossible. This is one way we are looking into the issue, we'll also improve JSII directly, but that will most likely only affect the get times, not the synth times. One positive impact of this was autocompletion for python was working for the first time for me 🎉 I think the sub-modules make it easier for the language server to follow
Were the synth benchmarks for locally generated providers? Would be interesting to compare with the published bindings.
While JSII has some fundamental slowness to it, we've seen some glaring performance issues so I'm optimistic that there is room for significant improvement there.
Since this seems to be primarily an optimization, starting with profiling data to determine what aspects are slow should give more insight into the problem and better inform the choice.
Were the synth benchmarks for locally generated providers? Would be interesting to compare with the published bindings.
Locally, I haven't tested pre-built yet, but I think we need to. in the old bindings there was already a stark contrast between pre-built and local providers.
While JSII has some fundamental slowness to it, we've seen some glaring performance issues so I'm optimistic that there is room for significant improvement there.
Yeah, we are also looking into the JSII performance, mostly for the cdktf get performance.
Since this seems to be primarily an optimization, starting with profiling data to determine what aspects are slow should give more insight into the problem and better inform the choice.
Do you mean profiling the synth operation?
Do you mean profiling the synth operation?
Yep. Should also profile the get operation, but that didn't feel like the focus of this PR.
Quick analysis of the impact of this change to user experience:
- All languages
- Users will generally require more import statements. I tend to let the IDE handle generating them so not that big of a deal, but for users hand writing them or looking at examples it will be a negative.
- Having the provider in it's own namespace (and not the root) is a bit harder for getting started.
- C#
- Nothing really additional to note
- Go
- Definitely looks clunky with starting from an arbitrary module name rather than the provider name. I haven't really done go development, so maybe this is common
- Java
- Having underscores in package names isn't convention, though that's not really new to this
- Python
- Nothing really additional to note
- Typescript
- More or less forces barrel imports (otherwise really strange qualifying), but that's probably not a big deal
I'm still curious to see more profiling details. I did find https://github.com/aws/jsii/issues/3389 in which users have done some profiling with Python and Typescript for the AWS CDK. The results were a bit surprising which is why I think it is valuable; pre-conceptions can be wrong when it comes to performance. It also sounds like the JSII team is making progress towards addressing some of the issues.
We don't want to negatively impact get performance; however, improving that seems like a separate endeavor. Good to have benchmarks to check though.
Pre-built providers are generally precompiled which is generally a big win for performance versus local providers. JSII does have additional implications so this may have unexpected benefits there.
@jsteinich here is the performance analysis I did over our examples and e2e examples: CDKTF Provider bindings profiling.pdf
Besides this we also see these other improvements
- python code completion works again (previously the language server crashed with OOM when using aws / azure / google)
- users working with L1 constructs (so currently most of our users) need to think about terraform resources anyways, so having our generated classes grouped by the resource names makes it easier to find things in the auto completion and it helps users learn directly about the Terraform resources
- we plan to add new classes per resource mid-term (e.g. for preconfigured iterators), we ran into compilation limits in the past when we had too many things in one namespace, so
Improving get is a separate endeavour, we could improve the performance a bit by parallelizing parts of it (releasing in 0.12.3) and we improved the go generation drastically (in JSII 1.67.0). Now the biggest offender is TS, but that is on the list for the JSII team.
RE Add doc string for each of the namespaces with a link to the TF docs: The docs are only picked up if you put them into a README.md in the folder of the namespace. The thing is right now we don't have a folder for the namespace, so this is not doable right now without changing this. Should have no effect besides adding the documentation to the namespace, but it's more work than I can finish today.
I'm going to lock this pull request because it has been closed for 30 days. This helps our maintainers find and focus on the active issues. If you've found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.