YamlDotNet icon indicating copy to clipboard operation
YamlDotNet copied to clipboard

Deserializing to JSON fails for integer

Open philippe-lavoie opened this issue 6 years ago • 18 comments

I have the following yaml file

id : 123 a : bob child: x : 10 y : 30

When reading the above

        using (var reader = new StringReader(yaml))
        {
            var deserializer = new YamlDotNet.Serialization.Deserializer();
            var serializer = new Newtonsoft.Json.JsonSerializer();
            var builder = new YamlDotNet.Serialization.DeserializerBuilder();
            var yamlObject = builder.Build().Deserialize(reader);
            var json = JsonConvert.SerializeObject(yamlObject);
            Console.Out.WriteLine(json);
        }

Then the output is

{"id":"123","a":"bob","child":{"x":"10","y":"30"}}

I would expect that the parser knows x and y are integers. This would have yielded

{"id":"123","a":"bob","child":{"x":10,"y":30}}

philippe-lavoie avatar Feb 08 '19 18:02 philippe-lavoie

I am having the same issue when it comes to Booleans.

var reader = new StringReader(@"test: true");
var deserializer = new DeserializerBuilder().Build();
var yamlObject = deserializer.Deserialize(reader);

var js = new SerializerBuilder().JsonCompatible().Build();

var json = js.Serialize(yamlObject);	

The json that is created ends up being: {"test": "true"} It should be a boolean but ends up a string.

DSchroer avatar Mar 14 '19 19:03 DSchroer

Any pointers how this could be fixed? Maybe I'll have a look as this is a show stopper for us.

dpoetzsch avatar Sep 24 '19 19:09 dpoetzsch

@dpoetzsch, if you know the shape of your data, the workaround is to create a strongly-defined type (in example below, MyPoco) and deserialize to that.

using System;
using System.IO;
using System.Text;
using YamlDotNet.Serialization;

class MyPoco
{
    public int A { get; set; }
    public bool B { get; set; }
    public string C { get; set; }
}

class Program
{
    static void Main()
    {
        using var reader = new StringReader(@"
A: 123
B: false
C: bob
");
        var deserializer = new DeserializerBuilder().Build();
        var myPocoObject = deserializer.Deserialize<MyPoco>(reader);

        var serializer = new SerializerBuilder().EmitDefaults().JsonCompatible().Build();
        var jsonString = serializer .Serialize(myPocoObject);

        Console.WriteLine(jsonString);
    }
}

Outputs: {"A": 123, "B": false, "C": "bob"}

YDN Version used: 6.1.2.

am11 avatar Sep 24 '19 20:09 am11

Thanks, but sadly we do not have a strongly-defined type but rely on the dynamic typing :( That's why I'd be very interested to help fixing this issue.

dpoetzsch avatar Sep 25 '19 13:09 dpoetzsch

You could solve this problem with a custom INodeTypeResolver that inspects the content of the scalars to determine their type. I wrote a proof of concept like this:

public class InferTypeFromValue : INodeTypeResolver
{
	public bool Resolve(NodeEvent nodeEvent, ref Type currentType)
	{
		var scalar = nodeEvent as Scalar;
		if (scalar != null)
		{
			// TODO: This heuristics could be improved
			int value;
			if (int.TryParse(scalar.Value, out value))
			{
				currentType = typeof(int);
				return true;
			}
		}
		return false;
	}
}

It needs to registered on the DeserializerBuilder:

var builder = new DeserializerBuilder()
	.WithNodeTypeResolver(new InferTypeFromValue());

Which then produces the desired results:

{"id":123,"a":"bob","child":null,"x":10,"y":30}

You can check a working sample here.

aaubry avatar Sep 25 '19 15:09 aaubry

@aaubry Hey, I just tried your solution and it seems to work quite flawlessly for all primitive types. Are there any potential pitfalls you can see on the horizon with this approach?

M3psipax avatar Sep 26 '19 16:09 M3psipax

This seems to work for me as well. Thanks.

@M3psipax The only thing that jumps to mind is the special Boolean types in YAML. You probably need an Enum to parse those.

https://yaml.org/type/bool.html

DSchroer avatar Sep 26 '19 16:09 DSchroer

@M3psipax I don't foresee any potential problem with this approach.

aaubry avatar Sep 28 '19 07:09 aaubry

Do we have any updates? Or the project has been closed?

kAleksei avatar Oct 05 '22 18:10 kAleksei

It's not closed, I just haven't had much time lately to keep going down the list of open issues. Do you still have problems with the latest release? I had implemented some code in there that takes a plain scalar string and converts it to the primitives. It's an opt-in feature. Not sure if it would solve your problem.

ecooke-macu avatar Oct 06 '22 17:10 ecooke-macu

@edwardcookemacu, could you specify the way to opt-in for true/false parsing as booleans? I encountered the problem when converting yaml to json, boolean values end up as strings which I would like to evade. UPD: for everyone who stumbles on it - .WithAttemptingUnquotedStringTypeDeserialization() call on DeserializerBuilder seems to do the trick.

Ekkeir avatar Mar 08 '23 10:03 Ekkeir

H guys, I guess my yaml file is a bit more complex or just big, and .WithAttemptingUnquotedStringTypeDeserialization() takes too long. So I used @aaubry 's code with some additions, and it's executed practically instantly. If anyone needs it:

public class InferTypeFromValue : INodeTypeResolver
{
    public bool Resolve(NodeEvent nodeEvent, ref Type currentType)
    {
        if (nodeEvent is Scalar { IsKey: false, Style: ScalarStyle.Plain } scalar)
        {
            if (Boolean.TryParse(scalar.Value, out bool _))
            {
                currentType = typeof(bool);
                return true;
            }
            if (Int32.TryParse(scalar.Value, out int _))
            {
                currentType = typeof(int);
                return true;
            }
            if (Double.TryParse(scalar.Value, out double _))
            {
                currentType = typeof(double);
                return true;
            }
        }
        return false;
    }
}

davorinpuhar avatar Nov 22 '23 14:11 davorinpuhar

What exception are you getting? Can you send a simple example of yaml and code that reproduces the problem?

EdwardCooke avatar Nov 22 '23 16:11 EdwardCooke

Yaml can't be publicly published but I will try to find the problem with elimination.

davorinpuhar avatar Nov 23 '23 08:11 davorinpuhar

I was being impatient, .WithAttemptingUnquotedStringTypeDeserialization() did complete but it took a couple of minutes. The exceptions I was seeing in the debug window were just "Exception thrown: 'System.FormatException' in System.Private.CoreLib.dll", probably expected on every unsuccessful parse attempt. I will stick to the .INodeTypeResolver solution for performance reasons for now. Thanks for a great library to all the developers.

davorinpuhar avatar Nov 23 '23 15:11 davorinpuhar

That’s interesting it took that long to complete. Is your yaml really big? I believe there’s some regex in there that could potentially be changed to not that may increase performance. All of our tests are pretty small with not many benchmarks. It’s one thing I’ve been wanting to work on since I’ve seen some performance questions mentioned in the issues. Just no time to do the cool stuff like that right now.

EdwardCooke avatar Nov 23 '23 19:11 EdwardCooke

My yaml is 2544 lines long and quite diverse. The exceptions that are routinely thrown are probably also bad for performance.

davorinpuhar avatar Nov 24 '23 06:11 davorinpuhar

Those exceptions come from the underlying framework's .TryParse methods when determining the type. I suspect they're a lot more accurate and more efficient than a bunch of regex's. I can try and do some benchmarking but it'll be some time and probably relatively error prone so I'm a little hesitant to do that. 2544 lines isn't that big at all, that should go pretty quick, can you send me what you have for your deserializerbuilder? That would help in diagnosing the performance.

ecooke-macu avatar Nov 27 '23 15:11 ecooke-macu