dotnet-kube-client icon indicating copy to clipboard operation
dotnet-kube-client copied to clipboard

Handling of default values

Open felixfbecker opened this issue 7 years ago • 84 comments

I got diffing objects working well so far, but there is one problem: Several properties in Kubernetes objects are scalars with default values. For example:

  • int ProbeV1.FailureThreshold defaults to 3
  • string ServicePortV1.Protocol defaults to "TCP" and many more. These properties are often not declared in YAML configuration. But when deserializing the YAML, the properties get the C# type default value, instead of the server default value. I.e. FailureThreshold is always 0, Protocol is always null. When diffing against the server, this will then generate a patch that sets these values to the C# default value (0/null).

How should we handle this?

  • When diffing, check if the new value is the C# default value, and exclude it from the patch. This would make it impossible to actually set a field to 0 or null. Does the API have fields anywhere that can be set to the C#/Go default value? E.g. for FailureThreshold the minimum is 1, and in other places where 0 is valid I saw them use nullable/pointer types, e.g. ConfigMapVolumeSource.DefaultMode. But there are a lot of booleans that are not nullable, and this would make it impossible to switch a boolean back to false after it was set to true on the server.
  • Apply the server defaults client side. This would break if they ever change the default. The defaults are only exposed through documentation, not in the schema.
  • Make everything nullable. This would be a bad experience when querying from the server as these are always defined.
  • Use Dictionaries or dynamic instead of strongly typed objects for deserialisation. I would like to avoid this because then we lose the formatting and autocompletion in PowerShell, and it becomes way harder to look up the merge strategies to use for properties...

felixfbecker avatar Aug 26 '18 16:08 felixfbecker

I think the best we could do until Kubernetes exposes the default value through OpenAPI is to maintain a hand-maintained list of default values in a map in the generator script, and apply that default value in the property definition.

felixfbecker avatar Aug 26 '18 18:08 felixfbecker

Interesting! I wonder how kubectl / client-go handle it. I take it the OpenAPI / Swagger document doesn't list these defaults? I'll take a look at both of these ideas this morning if I get time...

tintoy avatar Aug 26 '18 18:08 tintoy

kubectl apply circumvents the problem by only diffing maps instead of the struct types. I thought about this but I think it would severely impact the UX so I would rather hardcode defaults. The default value is not exposed through OpenAPI but usually mentioned in the field description. I suggested adding this in the comment I linked above.

I don't know how client-go handles this

felixfbecker avatar Aug 26 '18 18:08 felixfbecker

I like your idea - let's do that.

tintoy avatar Aug 26 '18 18:08 tintoy

So even if we have DefaultValue attributes on everything and initialisers, it would still mean that if a YAML does not declare a field and the value is different than the default on the server, it will reset the value to the default value. I wonder if that is acceptable behaviour or not. If we always ignore default values, it would be impossible to reset a value back to the default value.

Maybe we do need to make everything with a default nullable? So that null means "not specified"/"ignore" while the actual default value means "reset". The OpenAPI spec is inconsistent about this too - some scalars are nullable in addition to having a default value, while others are not.

The question is just how inconvenient this would make using the API client, if even fields are nullable that are guaranteed to be set to a default value by the server.

felixfbecker avatar Aug 26 '18 21:08 felixfbecker

Maybe we need to generate two sets of classes for API responses and for inputs? Where inputs just has more optional fields, so the API responses can extend the input classes and override the fields with default values to make them non-optional.

What do you usually do in C# in such cases? In TypeScript I can distinguish between null and undefined and scalars don't have zero values (undefined is the zero value for everything).

felixfbecker avatar Aug 26 '18 21:08 felixfbecker

Hmm, not sure about yamldotnet but JSON.NET has DefaultValueHandling so if the value is some well-known scalar value (or null) the member won't be serialised. Perhaps yamldotnet has something similar?

tintoy avatar Aug 26 '18 21:08 tintoy

https://github.com/aaubry/YamlDotNet/issues/21

tintoy avatar Aug 26 '18 21:08 tintoy

Serialisation is not the problem, it's generating the patch (diffing yaml vs server state)

felixfbecker avatar Aug 26 '18 21:08 felixfbecker

Ah. Yeah I could apply the DefaultValue attribute and you could look for that?

tintoy avatar Aug 26 '18 22:08 tintoy

You could treat that default value as "missing" or "unspecified" perhaps?

tintoy avatar Aug 26 '18 22:08 tintoy

If the attribute is.missing th

tintoy avatar Aug 26 '18 22:08 tintoy

Man, I hate doing GitHub on my phone :)

tintoy avatar Aug 26 '18 22:08 tintoy

If attribute is missing there is a way to determine default value for type but I'd have to look it up.

tintoy avatar Aug 26 '18 22:08 tintoy

Here you go:

https://stackoverflow.com/a/353073/885866

tintoy avatar Aug 26 '18 22:08 tintoy

Yeah, but if you always ignore the default value, then it would become impossible to set the field to the default value again after it was modified on the server...

felixfbecker avatar Aug 26 '18 22:08 felixfbecker

Here's another solution I just came up with:

Generate setters for every field (and backing properties) that will mark the property as dirty when set in a HashSet that can be queried publicly. E.g.

class KubeModel {
	protected HashSet<string> dirtyProperties = new HashSet<string>();
	public IImmutableSet<string> DirtyProperties {
		get { return dirtyProperties; }
	}
}
class DeploymentSpec extends KubeModel {
 	private int progressDeadlineSeconds;
    public int ProgressDeadlineSeconds {
   		get { return progressDeadlineSeconds; }
 		set {
  			progressDeadlineSeconds = value; 	
			dirtyProperties.Add("ProgressDeadlineSeconds");
		}
    }
}

It would be more code but it's generated, so doesn't really matter. This would not require changing types to nullable and also remove the need for maintaining a map of default values and even non-updateable properties (#29), because those properties would never get set by the YAML deserialisation. The diff implementation would ignore any properties that are not dirty.

The only other solution I can think of is to deserialise to Dictionaries, then build PSObjects and return those, but "lie" about the OutputType in the Cmdlet to make autocompletion still work. While building PSObjects, the "virtual" type needs to be added as a TypeName to make output formatting work. Then make the patch cmdlet convert back to dictionaries first, and make the diff implementation "track" the "virtual" model type of the dictionaries by starting at KubeResource and traversing through the Type as its traversing through the dictionaries, so it can retrieve the patch strategies etc from attributes. The disadvantage is that strictly Get-Kube* cmdlets now return a completely different type than ConvertFrom-KubeYaml.

What do you think?

felixfbecker avatar Aug 27 '18 09:08 felixfbecker

Interesting - what about just using the YAML document as-is; it only specifies fields that are specified after all :)

This is starting to get a little complex but I'm not against modifying the models in principle (or creating some sort of helper that can do the same). Let me have a think and see what I can do...

tintoy avatar Aug 27 '18 19:08 tintoy

What do you mean with as-is? Dictionary? We need to give it some type to serialize into

felixfbecker avatar Aug 27 '18 19:08 felixfbecker

What if we generate a second set of models in parallel to the first (in a namespace like KubeClient.Models.Tracked) which have the tracking behaviour and implicit / explicit cast operators (or AsTracked() / AsNonTracked()) to convert back and forth? The actual code for tracking changes can be improved using the [CallerMemberName] attribute (or whatever it's called).

tintoy avatar Aug 27 '18 19:08 tintoy

Not for the purposes of diff perhaps? Can't remember for yamldotnet but JSON.NET has JObject / Jtoken (a DOM for JSON, effectively). I imagine yamldotnet must have something similar? So I mean don't deserialise at all if you don't need to...

tintoy avatar Aug 27 '18 19:08 tintoy

Yep, they have a DOM for YAML:

https://github.com/aaubry/YamlDotNet/wiki/Samples.LoadingAYamlStream

tintoy avatar Aug 27 '18 19:08 tintoy

Yeah, but we can't operate on the YAML string ;) I think you mean what I wrote in my second idea:

The only other solution I can think of is to deserialise to Dictionaries, then build PSObjects and return those, but "lie" about the OutputType in the Cmdlet to make autocompletion still work. While building PSObjects, the "virtual" type needs to be added as a TypeName to make output formatting work. Then make the patch cmdlet convert back to dictionaries first, and make the diff implementation "track" the "virtual" model type of the dictionaries by starting at KubeResource and traversing through the Type as its traversing through the dictionaries, so it can retrieve the patch strategies etc from attributes. The disadvantage is that strictly Get-Kube* cmdlets now return a completely different type than ConvertFrom-KubeYaml.

If we go with separate class hierarchies, we'll have the same problem of fragmented types...

felixfbecker avatar Aug 27 '18 19:08 felixfbecker

Yep, they have a DOM for YAML

I believe the patch implementation shouldn't rely on the encoding being YAML though, so Dictionaries would be the better choice.

felixfbecker avatar Aug 27 '18 19:08 felixfbecker

If we go with separate class hierarchies, we'll have the same problem of fragmented types

A fair point.

We could, however, change the client so that its methods take / return a JObject/JToken (or a YamlDocument) instead of specific model types (i.e. no serialisation / deserialisation at all). Clients can then just use JObject.ToObject<TModel>()/JObject.ToObject(modelType) if required.

tintoy avatar Aug 27 '18 19:08 tintoy

I'm pretty comfortable with JToken and friends being passed around (it's a reasonably common way to pass untyped or semi-typed data around when that data was received in JSON format in the first place but the schema is not fixed); there's even reasonably high-fidelity conversion between YamlDocument and JSON:

https://github.com/aaubry/YamlDotNet/wiki/Samples.ConvertYamlToJson

tintoy avatar Aug 27 '18 19:08 tintoy

Seen this? https://github.com/wbish/jsondiffpatch.net

tintoy avatar Aug 27 '18 19:08 tintoy

I wouldn't want to have PowerShell users needing to do that though...

I have this vision of cmdlets that can all take the same strongly typed Kubernetes model objects as input and all output the same strongly typed objects. Trying to find a solution that gets as close to that as possible 🤔

Seen this? https://github.com/wbish/jsondiffpatch.net

Yeah I've seen that, it's not JSON Patch (RFC 6902) though. Kubernetes API only supports RFC 6902 JSON Patch, RFC 7386 JSON Merge Patch and their custom Strategic Merge Patch. I therefor went with JSON Patch.

felixfbecker avatar Aug 27 '18 19:08 felixfbecker

I agree that's the ideal way to go - but perhaps we might be able to simplify things a bit by keeping the 2 scenarios separate?

  1. Consumer wants to load objects, modify them, and then patch.
  2. Consumer wants to apply Kube YAML from a file.

For 1, this is a moot point because all fields are already populated; by definition, only fields they change will be changed (and need patch operations generated for them).

For 2, we have a set of keys from the YAML so we know what patch operations are required.

Have I missed anything, or does that more or less cover both scenarios?

tintoy avatar Aug 27 '18 19:08 tintoy

(so for 1, just do a property-level diff; all properties will have been populated before modifying)

tintoy avatar Aug 27 '18 19:08 tintoy