YamlDotNet icon indicating copy to clipboard operation
YamlDotNet copied to clipboard

Determining the type from a field

Open felixfbecker opened this issue 5 years ago • 11 comments

I want do deserialize Kubernetes YAML that looks like this:

apiVersion: v1
kind: Deployment
metadata:
  name: proxy
  namespace: prod
  # ...
spec:
  minReadySeconds: 10
  replicas: 1
  revisionHistoryLimit: 10
  # ...
apiVersion: v1
kind: Service
metadata:
  name: proxy
  namespace: prod
  # ...
spec:
  ports:
  - name: http
    port: 8080
    targetPort: http
  type: ClusterIP
  # ...

The type of the object (what other fields besides apiVersion and kind the object will have) is determined by the kind field. Depending on the value of kind, I need to map this YAML to e.g. either a Service or a Deployment model class.

I tried TypeResolvers and TypeConverters but couldn't figure it out. Is this possible somehow with YamlDotNet?

felixfbecker avatar Aug 19 '18 18:08 felixfbecker

I have a similar need when dealing with polymorphic objects. Is there an example for TypeResolvers and TypeConverters?

bloudraak avatar Sep 23 '18 19:09 bloudraak

I also have a similar need. My current workaround is a two phase serialization. I first deserialize to Dictionary<object,object>, then traverse the key value pairs and when I have determined the type (kind), I serialize the value and deserialize it using the determined type. It's ugly, but it works.

gliljas avatar Sep 26 '18 12:09 gliljas

I've implemented variations on this problem with a slightly different 2-phase deserialization. Here's three different ways to handle similar situations.

Polymorphic Document Type Use 2-phase deserialization. This is like the kubernetes example @felixfbecker posted. Instead of deserializing to a Dictionary, deserialize to a custom type with just a kind field:

    public class Kind
    {
        public string Kind { get; set; }
    }

    public void DoImportantStuff()
    {
        Deserializer deserializer = ...;
        Kind k = deserializer.Deserialize<Kind>(input);
        if (k.Kind == "Deployment")
            Deployment d = deserializer.Deserialize<Deployment>(input);
        // etc...
    }

Use a Tag to Indicate the Concrete Type When deserializing polymorphic fields inside the object hierarchy, I indicate the concrete type with a yaml tag. You can do this either by using DeserializerBuilder.WithTagMapping (easier) or by implementing a custom INodeTypeResolver (more flexible).

Yaml

target: !TargetingData # !TargetingData indicates the concrete type of the field
  targetAcquisitionType: CurrentTarget

C# using WithTagMapping:

    public Deserializer MakeAssetResolvingDeserializer()
    {
        var deserializerBuilder = new DeserializerBuilder()
            .WithTagMapping("!TargetingData", typeof(TagetingData));
        Deserializer deserializer = deserializerBuilder.Build();
        return deserializer;
    }

C# using custom INodeTypeResolver:

    public sealed class AllUniqueTypeNamesTagNodeTypeResolver : INodeTypeResolver
    {
        private readonly IDictionary<string, Type> _tagMappings;

        public AllUniqueTypeNamesTagNodeTypeResolver()
        {
            // create mappings so that the yaml parser can recognize that, for example,
            // items tagged with "!ScriptableObject" should be deserialized as a ScriptableObject.
            IDictionary<string, Type> typesByName = AssembliesTypeCatalog.Instance.UniqueTypesByName;
            var tagMappings = typesByName.ToDictionary(kv => "!" + kv.Key, kv => kv.Value);
            this._tagMappings = tagMappings;
        }

        bool INodeTypeResolver.Resolve(NodeEvent nodeEvent, ref Type currentType)
        {
            string typeName = nodeEvent.Tag; // this is what gets the "!TargetingData" tag from the yaml
            Type predefinedType;
            if (!string.IsNullOrEmpty(typeName))
            {
                bool arrayType = false;
                if (typeName.EndsWith("[]")) // this handles tags for array types like "!TargetingData[]"
                {
                    arrayType = true;
                    typeName = typeName.Substring(0, typeName.Length-2);
                }

                if (_tagMappings.TryGetValue(typeName, out predefinedType))
                {
                    currentType = arrayType ? predefinedType.MakeArrayType() : predefinedType;
                    return true;
                }
                else
                {
                    throw new YamlException(
                        $"I can't find the type '{nodeEvent.Tag}'. Is it spelled correctly? If there are" +
                        $" multiple types named '{nodeEvent.Tag}', you must used the fully qualified type name.");
                }
            }
            return false;
        }
    }

    public Deserializer MakeAssetResolvingDeserializer()
    {
        var deserializerBuilder = new DeserializerBuilder()
            .WithNodeTypeResolver(new AllUniqueTypeNamesTagNodeTypeResolver(),
                s => s.After<YamlDotNet.Serialization.NodeTypeResolvers.TagNodeTypeResolver>());
        Deserializer deserializer = deserializerBuilder.Build();
        return deserializer;
    }

Resolving a Reference to an External Object I needed to solve a related problem where the concrete instance to deserialize is fetched from an external repository. I handle this case with a custom INodeDeserializer in a way similar to the first 2-phase kubernetes example. First the external reference is deserialized to a custom AssetRef class, then the actual object is resolved from an external database.

Yaml:

damageSoundEffect: # The type of this property in the containing class is `UnityEngine.Object`
  path: Assets/Audio/Shared/Glbl_Combat_Player_Damage_Lrg_Audio.asset # "path" is not a field of `UnityEngine.Object`. Instead, this will be used to resolve an existing instance at parse-time.

C# (trimmed on the fly, so it may have errors)

    /// <summary>
    /// A reference to a UnityObject in the project Assets folder
    /// </summary>
    public class AssetRef
    {
        public string Path { get; set; }
    }

    private class UnityAssetNodeDeserializer : INodeDeserializer
    {
        private readonly INodeDeserializer _nodeDeserializer;
        public UnityAssetNodeDeserializer(INodeDeserializer nodeDeserializer)
        {
            _nodeDeserializer = nodeDeserializer;
        }

        bool INodeDeserializer.Deserialize(IParser parser, Type expectedType, Func<IParser, Type, object> nestedObjectDeserializer, out object value)
        {
            if (typeof(UnityEngine.Object).IsAssignableFrom(expectedType))
            {
                // For unity objects, intercept deserialization and just look them up in the asset database instead.

                // attempt to deserialize the yaml node as an asset ref instead of a standard POD object.
                if (!_nodeDeserializer.Deserialize(parser, typeof(AssetRef), nestedObjectDeserializer, out value))
                    return false;

                // use the deserialized asset path to lookup the actual asset and substitute the value
                var assetRef = ((AssetRef)value);
                value = AssetDatabase.LoadAssetAtPath(assetRef.Path, expectedType);
                return true;
            }
            else
            {
                return _nodeDeserializer.Deserialize(parser, expectedType, nestedObjectDeserializer, out value);
            }
        }
    }

    public Deserializer MakeAssetResolvingDeserializer()
    {
        var deserializerBuilder = new DeserializerBuilder()
            .WithNodeDeserializer(
                inner => new UnityAssetNodeDeserializer(inner, parsingState), // resolves "!assets" by path using the AssetDatabase
                s => s.InsteadOf<ObjectNodeDeserializer>());
        Deserializer deserializer = deserializerBuilder.Build();
        return deserializer;
    }

thisisthedave avatar Sep 26 '18 21:09 thisisthedave

Thanks @thisisthedave. This helped a lot.

I originally wanted to parse Document that looks something like this:

public class Document
{
      public List<Resource> Resources {get; set}
}

public abstract class Resource
{
     public string Kind { get; set; }
     public string Key { get; set; }
     // more common properties
}

public sealed class VirtualMachine : Resource
{
     public string Image { get; set; }
     // more common properties
}

public sealed class Network : Resource
{
     public List<Subnet> Subnets { get; set; }
     // more common properties
}

public sealed class StorageAccount : Resource
{
     public bool Encrypted { get; set; }
     // more common properties
}

Which could result in a document as follows:

resources:
- kind: virtual-machine
  key: vm1
  image: bla
- kind: network
  key: vnet1
  subnets:
  - name: subnet1
  - name: subnet2
- kind: storage-account
  key: sa1
  encrypted: true

I think I can do that with tags as you described.

bloudraak avatar Oct 04 '18 15:10 bloudraak

I'm sorry I was unable to answer this question in a timely fashion. As you have certainly moved on to other things, I will close this issue, but feel free to reopen it if this is still an issue.

aaubry avatar Sep 25 '19 22:09 aaubry

@aaubry no worries. Could you tell me which of the approaches you would recommend? I.e. which primitive in YamlDotNet is best to solve this problem?

felixfbecker avatar Sep 25 '19 22:09 felixfbecker

Currently there's no good way to achieve this, besides the two passes that were described above. This would definitely be useful but it is not clear how this could be achieved with the current implementation. The problem is that the deserializer was designed to work in a streaming fashion, as the YAML specification suggests. This means that as soon as we encounter a key, we need to be able to determine which property it will correspond to, therefore the destination type must already be known. Supporting this would probably require some kind of buffering mechanism.

aaubry avatar Sep 26 '19 00:09 aaubry

I want to achieve polymorphism in YAML like this:

Foo:
  FooTypeA:
    SomeProperty: 1
    SomeOtherProperty: 2

Foo is a property on a class and it is declared as abstract type, say FooBase. FooTypeA is the name of the concrete class that inherits from FooBase. SomeProperty and SomeOtherProperty are properties on FooTypeA. This is how Ansible YAML files are structured, so I am not the only one who follows this pattern.

There doesn't seem to be a way to get YamlDotNet to allow this. INodeTypeResolver looks promising, but it only gets MappingStart and I can't see the scalar after that to read FooTypeA and direct it accordingly. How can I get access to IParser in INodeTypeResolver.

I have absolutely no clue what to do with INodeDeserializer if I am supposed to use that.

ArrowRaider avatar Nov 20 '20 22:11 ArrowRaider

I also had need for abstract type resolving. I prototyped a solution which I am currently using.

Please see: https://gist.github.com/atruskie/bfb7e9ee3df954a29cbc17bdf12405f9

In short, it replaces the ObjectNodeDeserializer. When it encounters a known abstract type or interface it buffers all nodes for the current mapping and sends that buffer to code (ITypeDiscriminator) that inspects the parsing event chain to determine what type to provide to the deserializer.

It's cool because any part of a yaml document could be used to determine which child type should be used (including comments theoretically).

In my gist I included two type discriminators that I made for my purpose:

  • one that used the value of a common kind key to determine the type
  • one that used the presence of a key that is unique to each child type to determine the type

Hope this helps!

@aaubry if my solution even remotely passes for what you'd consider an optimal solution, I'll turn it into a PR

atruskie avatar Nov 24 '20 10:11 atruskie

Any updates on this issue (serialization of abstract types)? Also @atruskie your solution is the best that I've seen by far. But either options - writing type resolver and tagging (correct me if I'm wrong) requires knowing which types are able to be derived. This has no use if we deal with unknown types, unless we keep track of abstract types and their derrivatives by introducing something in the way of object mapper. To me the solution to this problem is relatively easy to implement (but idk if it contradicts the yaml specification). If the resolver comes by a type that is deriving from abstract class/interface it adds an additional "type" field to the node.

I thought also of implementing the "higher level serializer" which checks if the type is derived and "lover serializer" which its only job is to write the actual yaml text to a file, but it may harm the performance...

pmikstacki avatar May 08 '21 17:05 pmikstacki

@pmikstacki to deserialize any type, where the type name is encoded in a key, read the value of some key in the type resolver and use reflection to instantiate the type instance.

Be warned though, this is super dangerous from a security point if view.

More likely, you'll have a limited set of types you'll want to allow to be deserialized. In that case you can use reflection to iterate over all sub classes of an abstract type (or all implementers of an interface) and add them to the allowed ties at app startup.

atruskie avatar May 08 '21 23:05 atruskie

Since there are multiple solutions to this problem listed in here, I'm going to close this issue.

EdwardCooke avatar Jan 15 '23 05:01 EdwardCooke