pulumi-kubernetes icon indicating copy to clipboard operation
pulumi-kubernetes copied to clipboard

Fix: Missing Kind in preview mode (yaml/v2)

Open EronWright opened this issue 1 year ago • 4 comments
trafficstars

Proposed changes

The yaml/v2 package eagerly resolves kinds to apply a default namespace. This PR addresses a problem with the use case where a kind is installed by one ConfigGroup and then used by another. In preview mode, the kind cannot be found by the latter group because the expected side-effect (i.e. installation of a CRD) doesn't occur.

The solution to the issue is to maintain a cache in the provider of new CRDs. When the provider executes Check on a CustomResourceDefinition resource, it adds the CRD to the cache. Later, when a component resource must resolve a given kind, it checks the cache. In this way, the cache compensates for the lack of side-effects (i.e. API server calls) that would normally allow the kind to be resolved. Overall ordering is established using DependsOn.

It is reasonable to populate the cache during Check because a)Check produces the "planned" state that is needed later without actuating it, and b) Update is called conditionally and thus isn't a reliable alternative.

Implementation Details

The provider has three overlapping modes of operation - preview-mode, clusterUnreachable and yamlRenderMode - that affect the initialization and use of clientSet. Previously when clusterUnreachable is true then the clientSet is left null. Now, we new-up the clientSet itself but leave its various fields null. This change makes it possible to use the CRD cache in all modes, as was necessary to support yamlRenderMode.

Testing

A new integration test is added called yamlv2 that exercises this scenario.

Manual testing of "cluster unreachable mode" was done by misconfiguring my local environment (to have an invalid kube context). Tested in combination with "yaml rendering mode" and found that the latter works as expected even when the cluster is unreachable.

Related issues (optional)

EronWright avatar Mar 22 '24 20:03 EronWright

Does the PR have any schema changes?

Looking good! No breaking changes found. No new resources/functions.

github-actions[bot] avatar Mar 22 '24 20:03 github-actions[bot]

Codecov Report

Attention: Patch coverage is 41.79104% with 39 lines in your changes are missing coverage. Please review.

Project coverage is 27.63%. Comparing base (a15641e) to head (28c6912).

Files Patch % Lines
provider/pkg/clients/cache.go 50.00% 13 Missing and 5 partials :warning:
provider/pkg/clients/clients.go 36.84% 11 Missing and 1 partial :warning:
provider/pkg/provider/provider.go 22.22% 6 Missing and 1 partial :warning:
provider/pkg/provider/invoke_decode_yaml.go 0.00% 1 Missing :warning:
provider/pkg/provider/invoke_helm_template.go 0.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2904      +/-   ##
==========================================
+ Coverage   27.55%   27.63%   +0.08%     
==========================================
  Files          53       54       +1     
  Lines        7818     7862      +44     
==========================================
+ Hits         2154     2173      +19     
- Misses       5485     5504      +19     
- Partials      179      185       +6     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 23 '24 01:03 codecov[bot]

I had a discussion with @Frassle where he expressed support but advised me to submit some tests into pu/pu to harden the contract.

EronWright avatar Mar 26 '24 22:03 EronWright

I don't have enough context to feel strongly one way or the other. I do prefer the UX, but my default position is to usually err on the side of simpler-is-better when brushing up against system boundaries like this. If folks from the Platform side are on board with this then it seems like a no brainer.

There are a couple of arguments against this (or adding state anywhere in providers for that matter).

  1. Is it makes debug attach trickier because you probably now need a clear "Reset" signal to tell the long running attached provider process that it's time to clear its state and start again because a new deployment cycle is happening.
  2. It's less performant because we can't share the provider process across the preview and up phases of up (well I suppose we could if we had a supported Reset method like for 1).

But I suspect providers can do much smarter previews if they are stateful, and that's probably worth the downsides of the above.

Frassle avatar Mar 26 '24 22:03 Frassle

Just want to mention that the Cancel RPC method could maybe serve as a reset signal. Today, Cancel is not called unless the user cancels the deployment, but it is probably harmless to call at the end of every run, and could reset the provider state for another run. To the point about attaching to a long-running provider, cancellation is a further complication to that story unless the provider takes pains to 'reset' (p-k doesn't; it simply closes a channel).

EronWright avatar Mar 27 '24 21:03 EronWright