Improve function execution time
Describe your problem
Pipeline execution times are longer than expected for simple functions. For example, gcr.io/kpt-fn/set-namespace:v0.1 against a folder with a cert-manager installation manifest (including large CRDs) takes about 5s on my machine.
This includes roughly:
- 1.7s doing
docker pull - 1s in copyCommentsAndSyncOrder, because we fetch the open api schema for each resource
- 1.7s to actually run the function.
docker run alpineeven takes ~0.7s on my machine. - 0.3s doing misc tasks
Opening this issue to track performance improvements. So far it seems like changing the default image-pull-policy to match Kubernetes behavior would be a big contributor; copyCommentsAndSyncOrder seems like another possibility for optimization
Seems like 10% of CPU time (ie not including docker run time) is spent on reflect.DeepEqual on spec.Schema, always comapring it to spec.Schema{}. Maybe a smarter IsEmpty could improve this
profile.tar.gz profile of set-namespace running. 37% (0.5s) on IsCertainlyClusterScoped is suspicious. I don't even think we actually use the result of that function...
Dropping IsCertainlyClusterScoped cuts execution time from 2.3s to 0.7s. Big improvement! But we do use it in GetMatchingResourcesByCurrentId for resid.ResId.Equals. Seems like something we could improve though.
If we really need to read the full openapi.parseBuiltinSchema, maybe we can prebuild a go struct (autogenerated?) instead of parsing json.
Or just precompute a list of GVK -> bool (cluster scoped or not). A test can keep them in sync
See also #2304
For inspiration regarding function performance, especially kpt fn render performance, take a look at https://cdk8s.io/. Real-time hydration.
does it work with crictl ? I see the current documentation below KPT_FN_RUNTIME: The runtime to run kpt functions. It must be one of "docker", "podman" and "nerdctl".