YamlDotNet icon indicating copy to clipboard operation
YamlDotNet copied to clipboard

Literal scalar style ignored in case of leading space in line

Open werwolfby opened this issue 5 years ago • 2 comments

I can't find why when I specify string with leading space in some line it is stored as DoubleQuoted instead of Literal.

Example:

class Program
{
    static void Main(string[] args)
    {
        var serializer = new SerializerBuilder()
            .WithEventEmitter(e => new LiteralMultilineEventEmitter(e))
            .Build();

        var obj = "line with leading space \nother line";
        Console.WriteLine(serializer.Serialize(obj));
    }

    private class LiteralMultilineEventEmitter : ChainedEventEmitter
    {
        public LiteralMultilineEventEmitter(IEventEmitter nextEmitter) : base(nextEmitter)
        {
        }

        public override void Emit(ScalarEventInfo eventInfo, IEmitter emitter)
        {
            if (eventInfo.Source.Type == typeof(string) && eventInfo.Source.Value is string value && value.Contains("\n"))
            {
                eventInfo.Style = ScalarStyle.Literal;
            }
            base.Emit(eventInfo, emitter);
        }
    }
}

Returns:

"line with leading space \nother line"

While if I change value obj value to:

var obj = "line with leading space\nother line";

without space then it is serialized right:

|-
  line with leading space
  other line

I've debug code to Emitter where I can see that this is expected behavior, but I can't find such behavior in documentation.

More over even if I will try to read this Literal string from YAML it will be deserialized right as expected:

    static void Main(string[] args)
    {
        var serializer = new SerializerBuilder()
            .WithEventEmitter(e => new LiteralMultilineEventEmitter(e))
            .Build();

        var obj = "line with leading space \nother line";
        Console.WriteLine(serializer.Serialize(obj));

        var yaml =
            "|-\r\n" +
            "  line with leading space \r\n" +
            "  other line";
        
        var deserializer = new DeserializerBuilder().Build();
        var deserializedValue = deserializer.Deserialize<string>(yaml);
        Console.WriteLine(deserializedValue);
        Console.WriteLine(deserializedValue == obj);
    }

With output:

line with leading space 
other line
True

werwolfby avatar Dec 09 '19 08:12 werwolfby

I've found the same code in PyYAML.

werwolfby avatar Dec 09 '19 08:12 werwolfby

I found the same issue. It's due to the internal check for valid literal strings in Yaml not permitting spaces prior to line breaks. I wrote an event emitter to "clean" up these strings as we wanted all literal strings to appear that way and trailing space removal wouldn't break the data (it contained Html / JavaScript etc).

    public class LiteralScalarCleanerEmitter : ChainedEventEmitter
    {
        private static readonly Regex Pattern = new Regex(@"\s*$", RegexOptions.Compiled | RegexOptions.Multiline);

        public LiteralScalarCleanerEmitter(IEventEmitter nextEmitter)
            : base(nextEmitter)
        {
        }

        public override void Emit(ScalarEventInfo eventInfo, IEmitter emitter)
        {
            if (eventInfo.Source.Type == typeof(string) && eventInfo.Source.ScalarStyle == ScalarStyle.Literal)
            {
                var value = (string)eventInfo.Source.Value;

                if (!string.IsNullOrEmpty(value))
                {
                    eventInfo.RenderedValue = Pattern.Replace(value, string.Empty);
                }
            }

            base.Emit(eventInfo, emitter);
        }
    }

stuartbean avatar Jul 15 '20 06:07 stuartbean