spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

Generate java types from CRD yaml

Open sharebear opened this issue 5 years ago • 2 comments

Reading through some k8s issues regarding CRDs I spotted the following issue https://github.com/kubernetes/kubernetes/issues/59151 in summary, by the time CRDs hit GA the plan is for an unknown properties to be stripped from the resource if it isn't defined in the schema in the CRD. If this is the case then that would mean this project would have the full validation defined in both the CRD and the json schema document. Does it then make more sense if the java types could be generated directly from the CRD? Perhaps with a small tool that wraps https://github.com/OpenAPITools/openapi-generator

Note: there is also a dicussion about replacing openapi schema with something as yet undefined https://github.com/kubernetes/kubernetes/issues/67840

sharebear avatar Oct 09 '18 21:10 sharebear

Ah, I see now that I've been reading this the wrong way round, the actual workflow is

  1. Define configuration in json schema
  2. Generate java types from json schema
  3. Generate CRD based upon java type (assuming CRD does not already exist in cluster)

So the issue here is more to ensure that the CRD created in initCrds contains a complete schema for the type to ensure that it keeps working after GA (which should be trivial if the project sticks to openapiv3 as the schema version).

sharebear avatar Oct 09 '18 22:10 sharebear

Yes, that's the actual workflow. When writing this logic I wasn't aware of this feature, it's well described here. I learnt this a month ago. Do you know since which version Kubernetes supports this feature? Ideally, the information about the structure/schema of CRs/CMs should be only on one place.

I would like to also align the operator with OLM. OLM is itself an operator and it takes care of installing, updating other operators. It requires that the operator has its own CRDs defined in yamls (not in code like we do or spark operator from GCP does).

So this is kind of tough issue, because there are multiple factors and it will boil down to a question about what is optimal and single "source of truth" for the schema and how to build the toolkit around it to keep it "DRY".

jkremser avatar Oct 10 '18 09:10 jkremser