pulumi Reduce size of typescript input/output definitions files

Team, looks like typescript definitions embedded in one single package (azure-native) drains tons of resources on the development machine, is it possible to divide the package into multi-package / plugin per namespace? Something like: @pulumi/azure-native-network . Another possible solution would be to not create one giant input types definition file and make it per namespace. Thanks

May 19 '21 06:05 XBeg9

any updates here?

Aug 18 '21 08:08 vtereshyn

Moving this one over to pulumi-azure-native, as this issue is largely specific to use of that package which is particularly large.

Aug 18 '21 15:08 lukehoban

Cross referencing to https://github.com/pulumi/pulumi/issues/7653 which is related to this - and would address some of the performance impacts related to this.

Aug 18 '21 15:08 lukehoban

I moved this issue to pulumi/pulumi as that's where the TypeScript SDK generation is implemented

Dec 20 '21 15:12 mikhailshilkov

This issue is mentioned in https://github.com/pulumi/pulumi-azure-native/issues/932 which is currently the second highest up-voted issue for azure-native.

Aug 12 '22 12:08 danielrbradley

Just to explicitly provide an update for the community members following along, this work is slated for implementation. At first blush, it seems likely we'll be able to build a types file for each resource, but I'm unsure how we can keep the types defined at each index.d.ts file from ballooning memory on load.

Sep 15 '22 11:09 RobbieMcKinstry

@RobbieMcKinstry I don't want to repeat myself again, but there is still an option to divide azure-native into smaller packages like @pulumi/azure-native-network, @pulumi/azure-native-containerregistry and etc, it's very rare that you need everything at once in one single repo. This could be easily published using monorepo like https://github.com/lerna/lerna, you just need to change the way you generate the SDK, exactly what @mikhailshilkov said.

Sep 15 '22 11:09 XBeg9

Thanks for reiterating, @XBeg9. I think you're definitely on to something! Personally, I have to imagine that's a change we'd want to make in the long run, but I'm not on a team with ownership of Azure Native, so my say doesn't matters little compared to the experts :) You're right; virtually nobody needs more than a smattering of resources offered by each cloud provider, and it's odd to make all users pay the same price in terms of memory/disk.

Specifically for Azure Native, I see that the issue of splitting packages is already tracked here. It looks like @danielrbradley has already started on some blocking groundwork here. There's also precedent for doing a similar cleaving of the Go SDK that Daniel recently landed. Not what we're discussing here but in the same ballpark.

My understanding was the Platform team is using this issue to track the splitting of nodes_modules/types/output.d.ts and similar files into individual type declaration files for each namespace. We suspect that if we can split the type definitions, we can improve performance across all cloud providers, even without package splitting.

Sep 15 '22 12:09 RobbieMcKinstry

👋🏻 Hello! I wanted to follow up with some additional information about the change that landed which closed this issue. In https://github.com/pulumi/pulumi/pull/10831 we changed code generation to output many smaller files instead of a few massive ones.

Background

As of v3.46.1 (the current release at the time of writing), types for each provider are written to types/input.ts or types/output.ts for input types and output types, respectively. Each of those files have namespaces corresponding to the package's modules. This means that if you want the input types for aws/s3, you need to open up types/input.ts, read in the entire file, and access the namespace aws.s3.

Filesystem Diagram, Before

This Change

This will change when https://github.com/pulumi/pulumi/pull/10831 is released (next week, barring regressions). Now, each top-level namespace previously in input.ts corresponds to a subdirectory in types. Those subdirectories will also contain input.ts and output.ts files, which describe just the input/output types declared at that level of granularity, and make re-export types in deeper subfolders.

Filesystem Diagram, After

This is all backward-compatible with the current API. And, I believe, it does not currently offer a performance benefit. All of our input.ts and output.ts files (and resource files for that matter) still import the top-level types/input.ts. So even if you changed your import to import { Foo } from "aws/types/s3/input", which is a smaller file, you won't see a performance benefit, because that input file will still import aws/types/{input, output}.ts which imports everything.

So, I don't expect this change to improve performance. While I haven't run any benchmarks on it yet, we will be monitoring the performance going forward, including before release.

Next Steps

It sucks that this change shouldn't improve performance. ☹️ However, fixing this issue was a blocker for https://github.com/pulumi/pulumi/issues/10442 which will improve performance in exactly the ways we're discussing here. #10442 proposes making our compiler smart enough to localize imports. Let's look at some code:

This is taken from https://github.com/pulumi/pulumi-aws:/ec2/instance.ts

import * as inputs from "../types/input";
// ...
export interface InstanceState {
    // ...
    capacityReservationSpecification?: pulumi.Input<inputs.ec2.InstanceCapacityReservationSpecification>;
    // ...
}

In this snippet, we see that our compiler generates code that imports all input types by pulling in the types/input. Since our API is backward-compatible after https://github.com/pulumi/pulumi/pull/10831 , this file re-exports all of the input files beneath it in the directory tree. Then, when we reference inputs.ec2.InstanceCapacityReservationSpecification, we're dereferencing the types/input.ts module, which dereferences the ec2 namespace, which dereferences InstanceCapacityReservationSpecification.

The primary change that https://github.com/pulumi/pulumi/issues/10442 will introduce which will provide a major performance win is that it will change the code to look like this:

import { InstanceCapacityReservationSpecification } from "../types/input/ec2";
export interface InstanceState {
    capacityReservationSpecification?: pulumi.Input<InstanceCapacityReservationSpecification>;
}

(This isn't the exact syntax, it's still WIP).

This lets us skip importing the entire list of types from aws and go straight to ec2 (or maybe even ec2/input.ts).

However, implementing this is no small feat. It requires adding a compiler pass to identify which imports are required for each namespace. And the blast radius isn't limited to the types directory -- as demonstrated above this optimization will apply to resource definitions too.

I can't make any promises, but I'd like to break ground on https://github.com/pulumi/pulumi/issues/10442 by the end of the month. If I miss that window, however, it's possible the work won't be scheduled since we've budgeted this quarter for Performance Improvements, and it's unclear if there will be future time allocation Q1 2023 for this work.

Nov 11 '22 19:11 RobbieMcKinstry

I regret to say, we have to reopen this issue. The PR which closed this issue has been reverted due to a bug in TSC. I'm tracking the work needed to get this back into place in a new issue which I'll link below in a few minutes.

Dec 06 '22 15:12 RobbieMcKinstry

Here's the tracking issue for the related changes.

Dec 06 '22 17:12 RobbieMcKinstry

Hi! Do you happen to have any updates on this?

Mar 22 '23 10:03 vtereshyn

Hi @vtereshyn after backing out my previous PR, we realized we could improve the API when making this change, so we tacked on some extra work to coincide with a repaired version of the original PR. That's all linked in the tracking issue. When Q1 2023 started, I expressed to my manager that I felt I needed to park this issue for a time (burning out looking at the same code for so long); now that we're entering Q2 I feel better about tackling this issue now. I'm hoping we'll have some bandwidth to schedule this for Q2. The majority of the work is in three steps:

Rebasing the reverted PR to get caught back up with master.
Adding a toggle to allow providers to incrementally enable this feature.
Moving the split files from /types to be colocated with their resources. This is definitely the biggest component.

Mar 22 '23 16:03 RobbieMcKinstry

@RobbieMcKinstry – thank you for the update.

I feel this will be a huge step forward and a significant improvement. I am looking forward to seeing that in Q2.

Mar 23 '23 10:03 vtereshyn

My pleasure! Always happy to chat!

Mar 23 '23 14:03 RobbieMcKinstry