kopium Add a brute force option to re-use identical structs

Currently, it's impossible for kopium to detect similar structs re-used at different hierarchies because the schemas all have to be inlined and we cannot use $ref in schemas. https://github.com/kubernetes/kubernetes/issues/62872

Suggestion

We should try to brute-force compare all structs (field names + types) to all previously seen structs in the analyzer to deduplicate so that we don't generate 3 different structs for things such as livenessProbe, resources, MatchLabels (under different names).

We already have the skeletons of such code, but it only checks for a few known ones like Condition here and it's enabled toggled under a flag (default enabled) --no-condition.

If we pass existing state into the extractor functions (such as extract_container) we can then inspect current state before adding duplicates (ensuring we only take the first instance of a struct) via something like is_duplicate_struct. It would also avoid having an awkward O(n^2) algorithm at the end.

My thinking is that it might not be a particularly heavy operation on (at most) a few hundred kB of schema data.

Limitations

This would not solve the problem where we are generating a type for something know; e.g. something like Condition might be deduplicated with this algorithm alone, but it would not point to the correct upstream type in e.g. k8s-openapi without futher logic.

This is probably fine however; it allows overrides to be done more easily (change / elide one struct rather than find and replace multiple), and it's a good stepping stone towards identifying them with upstream variants (such as is done for Condition).

More Context

https://github.com/kube-rs/gateway-api-rs/issues/38
https://github.com/kube-rs/kube/discussions/1667

Dec 17 '24 02:12 clux

@clux please assign me, I'm ready to have a go at this. 🖖

As written, this is valuable for https://github.com/kube-rs/gateway-api-rs/issues/38. Notably however, projects like Gateway API often have types that are shared between multiple CRDs.

As such I would like to do some experimentation with adding support for handling multiple CRDs at once, searching them collectively for duplicates, and emitting multiple modules. Specifically: I would like to look into taking the current generate/analyze/struct loop process and breaking it out to run multiple times for each CRD, with a fan-out/fan-in step added in the midst to collect and compare types between all CRDs, and then emitting a special shared_types.rs module followed by the dependent ${CRD_PLURAL}.rs modules, e.g.:

.
└── src
    └── gateway
        ├── gatewayclasses.rs
        ├── gateways.rs
        ├── httproutes.rs
        ├── mod.rs
        └── shared_types.rs

This new mode being an opt-in and leaving all the current behavior alone. LMKWYT? 🤔

Mar 30 '25 12:03 shaneutt

I think this would be very useful, and a clear upgrade on the original suggestion. Happy to do reviews on sketches.

Your plan sounds sensible to me. A few quick throwaway thoughts on multiple crds;

this would probably need another input flag point that conflicts_with both file and crd (in main), but I see no problem with supporting such a mode, be it for an api group, or a listed set of crds (usually find myself generating output based on a subset of crds in a group for controllers).
possibly helpful to have another super-struct in output.rs for Schema or something that can encapsulate everything you find for one crd (instead of the simple Output we currently have), then any algorithm can run on a set of Schema objects

Mar 30 '25 17:03 clux

Due to time constraints I haven't been able to go at this in earnest and complete a PR. At this point too, we have a workaround for gateway-api-rs. So for now I think it makes more sense to unassign me and let someone else have a go at this if they want. Perhaps if nobody picks it up for a while I can see if I can take another swing at it later.

Jul 25 '25 21:07 shaneutt

kopium kopium copied to clipboard

Add a brute force option to re-use identical structs

Suggestion

Limitations

More Context

kopium
kopium copied to clipboard