YamlDotNet
YamlDotNet copied to clipboard
While scanning a literal block scalar, found extra spaces in first line.
Hi,
I have some YAML files from a 3rd party which I'm reading and converting to JSON to make it easier to process. For a few files on the deserialize step it is failing with "While scanning a literal block scalar, found extra spaces in first line".
Trace:
at YamlDotNet.Core.Scanner.ScanBlockScalarBreaks(Int32 currentIndent, StringBuilder breaks, Boolean isLiteral, Mark& end, Nullable`1& isFirstLine) at YamlDotNet.Core.Scanner.ScanBlockScalar(Boolean isLiteral) at YamlDotNet.Core.Scanner.FetchBlockScalar(Boolean isLiteral)
Code:
var yamlDeserializer = new DeserializerBuilder().Build();
var yamlObject = yamlDeserializer.Deserialize(sr); -- FAILS HERE --
Looking at the data it appears that the problem is the files have a multi-line literal with additional carriage returns at the beginning. I'm new to YAML, but I'm wondering why it isn't considered reasonable to have more than one CRLF or indeed more than one space at the start of some literal text? Apart from editing the files is there any way around this?
Example literal with extra CRLF at the start:
Body: |+
Begin forwarded message:
I presume it is to do with trying to establish the indentation - are the files invalid or should the scanner be reading until it finds a non-blank line?
Thanks.
I think this issue warrants a response. We are grappling with the same problem manifesting in a different way. Whenever we have empty lines (auto indented by text editors) our literal style is auto-forced into double quoted. I found where this is happening in the code Ln 911 @ Emitter.cs .
Rather than throwing an error, the Emitter object quietly changes the style to double quote if it feels the Scalar in question does not fit the bill to be a block Literal type. .. Suggestion: Allow forced styles or throw an error if requested style is not allowed.
Of course a response is warranted. This is certainly a bug but I didn't have a chance to look into it. Do you want to help, @wspresto ?
Thanks for the response @aaubry. I did start to look at the code, but unfortunately like you I don't really have the time to dedicate either. In the end I used some regex to fix up most of the files and did a few manually so that that the parser could read them (this lost the first empty line, but in this case it doesn't really matter). I was doing a data migration so it's a one off exercise for me (hopefully). We've never been given YAML files before as a data extract, and having seen YAML and in particular the quality of the files we were given (the format was weird even for YAML) I hope to never see them again ;) Thank you for this library though, it definitely got me through and would have been impossible without it!
Hello @EdwardCooke and @JuergenGutsch,
I've encountered the same issue described here when trying to parse a valid YAML document containing an empty line after a block scalar indicator (|-). This has caused some trouble in my application, as it relies on parsing YAML files that might include this particular case.
Given that this issue has been open for some time, I wanted to kindly ask if there is any update or progress on addressing it? This problem significantly impacts the usability of the YamlDotNet library for certain use cases, and it would be great to have a resolution in the near future.
Thank you for your attention and for your work on this library. I appreciate your efforts to make YamlDotNet a reliable and robust tool for the community.
@uanvas Let me have a look during the weekend.
@uanvas Looking at the YAML specification it seems it is not valid to start a literal block scalar with an empty line. A scalar followed by an empty line is an empty scalar and if the following line after the leading line break has a child indentation, it seems to be wrong. Actually, the specification doesn't mention leading empty lines and I miss some more error specifications.
This means the specific files seem to be invalid.
If it helps Quoting the content will make the leading empty line valid.
The example by @LaughingJohn is valid. See also one of the spec examples: https://matrix.yaml.info/details/4QFQ.html However, more spaces in the first, "empty" line than the following indentation is invalid, like in that test case: https://matrix.yaml.info/details/5LLU.html
@JuergenGutsch, thank you for checking the issue. I utilized common validators like https://www.yamllint.com/ to validate the YAML, which led me to believe that it is valid.
However, when it comes to quoting the content, I'm unable to do so since I am validating the provided YAML.
Body: |-
Begin forwarded message:
I ran into the same issue. Isn't the following valid yaml?
Body: |-4
<-- space until here (8 spaces in total)
Foo
Should result in " Foo"
because with explicit indention of 4 the string is clearly defined.
Online yaml to json converters convert this to the expected string value.
Note that adding a single tab in the string, say " \t\n Foo"
works perfectly, so this feels like a bug.
Edit: js-yaml will also parse the given yaml without any error
I’ll have to check but I’m pretty sure there’s nothing in yamldotnet parser that will handle that.