YamlDotNet
YamlDotNet copied to clipboard
Deserializing to JSON fails for integer
I have the following yaml file
id : 123 a : bob child: x : 10 y : 30
When reading the above
using (var reader = new StringReader(yaml))
{
var deserializer = new YamlDotNet.Serialization.Deserializer();
var serializer = new Newtonsoft.Json.JsonSerializer();
var builder = new YamlDotNet.Serialization.DeserializerBuilder();
var yamlObject = builder.Build().Deserialize(reader);
var json = JsonConvert.SerializeObject(yamlObject);
Console.Out.WriteLine(json);
}
Then the output is
{"id":"123","a":"bob","child":{"x":"10","y":"30"}}
I would expect that the parser knows x and y are integers. This would have yielded
{"id":"123","a":"bob","child":{"x":10,"y":30}}
I am having the same issue when it comes to Booleans.
var reader = new StringReader(@"test: true");
var deserializer = new DeserializerBuilder().Build();
var yamlObject = deserializer.Deserialize(reader);
var js = new SerializerBuilder().JsonCompatible().Build();
var json = js.Serialize(yamlObject);
The json that is created ends up being:
{"test": "true"}
It should be a boolean but ends up a string.
Any pointers how this could be fixed? Maybe I'll have a look as this is a show stopper for us.
@dpoetzsch, if you know the shape of your data, the workaround is to create a strongly-defined type (in example below, MyPoco
) and deserialize to that.
using System;
using System.IO;
using System.Text;
using YamlDotNet.Serialization;
class MyPoco
{
public int A { get; set; }
public bool B { get; set; }
public string C { get; set; }
}
class Program
{
static void Main()
{
using var reader = new StringReader(@"
A: 123
B: false
C: bob
");
var deserializer = new DeserializerBuilder().Build();
var myPocoObject = deserializer.Deserialize<MyPoco>(reader);
var serializer = new SerializerBuilder().EmitDefaults().JsonCompatible().Build();
var jsonString = serializer .Serialize(myPocoObject);
Console.WriteLine(jsonString);
}
}
Outputs: {"A": 123, "B": false, "C": "bob"}
YDN Version used: 6.1.2.
Thanks, but sadly we do not have a strongly-defined type but rely on the dynamic typing :( That's why I'd be very interested to help fixing this issue.
You could solve this problem with a custom INodeTypeResolver
that inspects the content of the scalars to determine their type. I wrote a proof of concept like this:
public class InferTypeFromValue : INodeTypeResolver
{
public bool Resolve(NodeEvent nodeEvent, ref Type currentType)
{
var scalar = nodeEvent as Scalar;
if (scalar != null)
{
// TODO: This heuristics could be improved
int value;
if (int.TryParse(scalar.Value, out value))
{
currentType = typeof(int);
return true;
}
}
return false;
}
}
It needs to registered on the DeserializerBuilder
:
var builder = new DeserializerBuilder()
.WithNodeTypeResolver(new InferTypeFromValue());
Which then produces the desired results:
{"id":123,"a":"bob","child":null,"x":10,"y":30}
You can check a working sample here.
@aaubry Hey, I just tried your solution and it seems to work quite flawlessly for all primitive types. Are there any potential pitfalls you can see on the horizon with this approach?
This seems to work for me as well. Thanks.
@M3psipax The only thing that jumps to mind is the special Boolean types in YAML. You probably need an Enum to parse those.
https://yaml.org/type/bool.html
@M3psipax I don't foresee any potential problem with this approach.
Do we have any updates? Or the project has been closed?
It's not closed, I just haven't had much time lately to keep going down the list of open issues. Do you still have problems with the latest release? I had implemented some code in there that takes a plain scalar string and converts it to the primitives. It's an opt-in feature. Not sure if it would solve your problem.
@edwardcookemacu, could you specify the way to opt-in for true/false parsing as booleans?
I encountered the problem when converting yaml to json, boolean values end up as strings which I would like to evade.
UPD: for everyone who stumbles on it - .WithAttemptingUnquotedStringTypeDeserialization()
call on DeserializerBuilder
seems to do the trick.
H guys, I guess my yaml file is a bit more complex or just big, and .WithAttemptingUnquotedStringTypeDeserialization()
takes too long. So I used @aaubry 's code with some additions, and it's executed practically instantly. If anyone needs it:
public class InferTypeFromValue : INodeTypeResolver
{
public bool Resolve(NodeEvent nodeEvent, ref Type currentType)
{
if (nodeEvent is Scalar { IsKey: false, Style: ScalarStyle.Plain } scalar)
{
if (Boolean.TryParse(scalar.Value, out bool _))
{
currentType = typeof(bool);
return true;
}
if (Int32.TryParse(scalar.Value, out int _))
{
currentType = typeof(int);
return true;
}
if (Double.TryParse(scalar.Value, out double _))
{
currentType = typeof(double);
return true;
}
}
return false;
}
}
What exception are you getting? Can you send a simple example of yaml and code that reproduces the problem?
Yaml can't be publicly published but I will try to find the problem with elimination.
I was being impatient, .WithAttemptingUnquotedStringTypeDeserialization()
did complete but it took a couple of minutes. The exceptions I was seeing in the debug window were just "Exception thrown: 'System.FormatException' in System.Private.CoreLib.dll", probably expected on every unsuccessful parse attempt.
I will stick to the .INodeTypeResolver
solution for performance reasons for now. Thanks for a great library to all the developers.
That’s interesting it took that long to complete. Is your yaml really big? I believe there’s some regex in there that could potentially be changed to not that may increase performance. All of our tests are pretty small with not many benchmarks. It’s one thing I’ve been wanting to work on since I’ve seen some performance questions mentioned in the issues. Just no time to do the cool stuff like that right now.
My yaml is 2544 lines long and quite diverse. The exceptions that are routinely thrown are probably also bad for performance.
Those exceptions come from the underlying framework's .TryParse
methods when determining the type. I suspect they're a lot more accurate and more efficient than a bunch of regex's. I can try and do some benchmarking but it'll be some time and probably relatively error prone so I'm a little hesitant to do that. 2544 lines isn't that big at all, that should go pretty quick, can you send me what you have for your deserializerbuilder
? That would help in diagnosing the performance.