YamlDotNet icon indicating copy to clipboard operation
YamlDotNet copied to clipboard

While scanning a literal block scalar, found extra spaces in first line.

Open LaughingJohn opened this issue 4 years ago • 11 comments

Hi,

I have some YAML files from a 3rd party which I'm reading and converting to JSON to make it easier to process. For a few files on the deserialize step it is failing with "While scanning a literal block scalar, found extra spaces in first line".

Trace:

at YamlDotNet.Core.Scanner.ScanBlockScalarBreaks(Int32 currentIndent, StringBuilder breaks, Boolean isLiteral, Mark& end, Nullable`1& isFirstLine)   at YamlDotNet.Core.Scanner.ScanBlockScalar(Boolean isLiteral)   at YamlDotNet.Core.Scanner.FetchBlockScalar(Boolean isLiteral)

Code:

var yamlDeserializer = new DeserializerBuilder().Build();
var yamlObject = yamlDeserializer.Deserialize(sr); -- FAILS HERE --

Looking at the data it appears that the problem is the files have a multi-line literal with additional carriage returns at the beginning. I'm new to YAML, but I'm wondering why it isn't considered reasonable to have more than one CRLF or indeed more than one space at the start of some literal text? Apart from editing the files is there any way around this?

Example literal with extra CRLF at the start:

    Body: |+
      
      Begin forwarded message:

I presume it is to do with trying to establish the indentation - are the files invalid or should the scanner be reading until it finds a non-blank line?

Thanks.

LaughingJohn avatar Aug 24 '20 11:08 LaughingJohn

I think this issue warrants a response. We are grappling with the same problem manifesting in a different way. Whenever we have empty lines (auto indented by text editors) our literal style is auto-forced into double quoted. I found where this is happening in the code Ln 911 @ Emitter.cs .

Rather than throwing an error, the Emitter object quietly changes the style to double quote if it feels the Scalar in question does not fit the bill to be a block Literal type. .. Suggestion: Allow forced styles or throw an error if requested style is not allowed.

wspresto avatar Oct 19 '20 16:10 wspresto

Of course a response is warranted. This is certainly a bug but I didn't have a chance to look into it. Do you want to help, @wspresto ?

aaubry avatar Oct 20 '20 09:10 aaubry

Thanks for the response @aaubry. I did start to look at the code, but unfortunately like you I don't really have the time to dedicate either. In the end I used some regex to fix up most of the files and did a few manually so that that the parser could read them (this lost the first empty line, but in this case it doesn't really matter). I was doing a data migration so it's a one off exercise for me (hopefully). We've never been given YAML files before as a data extract, and having seen YAML and in particular the quality of the files we were given (the format was weird even for YAML) I hope to never see them again ;) Thank you for this library though, it definitely got me through and would have been impossible without it!

LaughingJohn avatar Oct 27 '20 10:10 LaughingJohn

Hello @EdwardCooke and @JuergenGutsch,

I've encountered the same issue described here when trying to parse a valid YAML document containing an empty line after a block scalar indicator (|-). This has caused some trouble in my application, as it relies on parsing YAML files that might include this particular case.

Given that this issue has been open for some time, I wanted to kindly ask if there is any update or progress on addressing it? This problem significantly impacts the usability of the YamlDotNet library for certain use cases, and it would be great to have a resolution in the near future.

Thank you for your attention and for your work on this library. I appreciate your efforts to make YamlDotNet a reliable and robust tool for the community.

uanvas avatar May 11 '23 19:05 uanvas

@uanvas Let me have a look during the weekend.

JuergenGutsch avatar May 12 '23 07:05 JuergenGutsch

@uanvas Looking at the YAML specification it seems it is not valid to start a literal block scalar with an empty line. A scalar followed by an empty line is an empty scalar and if the following line after the leading line break has a child indentation, it seems to be wrong. Actually, the specification doesn't mention leading empty lines and I miss some more error specifications.

This means the specific files seem to be invalid.

JuergenGutsch avatar May 17 '23 21:05 JuergenGutsch

If it helps Quoting the content will make the leading empty line valid.

JuergenGutsch avatar May 17 '23 21:05 JuergenGutsch

The example by @LaughingJohn is valid. See also one of the spec examples: https://matrix.yaml.info/details/4QFQ.html However, more spaces in the first, "empty" line than the following indentation is invalid, like in that test case: https://matrix.yaml.info/details/5LLU.html

perlpunk avatar May 17 '23 22:05 perlpunk

@JuergenGutsch, thank you for checking the issue. I utilized common validators like https://www.yamllint.com/ to validate the YAML, which led me to believe that it is valid.

However, when it comes to quoting the content, I'm unable to do so since I am validating the provided YAML.

Body: |-
      
      Begin forwarded message:

uanvas avatar May 24 '23 15:05 uanvas

I ran into the same issue. Isn't the following valid yaml?

Body: |-4
        <-- space until here (8 spaces in total)
    Foo

Should result in " Foo" because with explicit indention of 4 the string is clearly defined. Online yaml to json converters convert this to the expected string value.

Note that adding a single tab in the string, say " \t\n Foo" works perfectly, so this feels like a bug.

Edit: js-yaml will also parse the given yaml without any error

gruenedd avatar Jul 19 '23 15:07 gruenedd

I’ll have to check but I’m pretty sure there’s nothing in yamldotnet parser that will handle that.

EdwardCooke avatar Jul 19 '23 22:07 EdwardCooke