Different behaviour between psych and libyaml on indentation
Hi,
When I try to parse the following YAML snippet:
- indentation 2 spaces
- indentation 2 spaces
- indentation 0 space
It fails with libyaml with error code = 4 (i.e. YAML_PARSER_ERROR) and description = did not find expected <document start>.
With psych, it does not complain at all.
Any idea why?
Thanks!
I don't really understand. Can you provide some code as an example? Psych uses libyaml, so it should be the same.
Hi,
Thanks for your fast reply.
So I downloaded and compiled libyaml from the Github repo here.
Then I used the run-load.c test to load a sample file with the YAML snippet above and it fails (i.e. once compiled, I run the test like this: ./run-load test.yaml).
But when I use psych.load_file() in pure Ruby to load the same file, it succeeds.
I would expect the same behaviour i.e. both failing to load the file.
Let me know if you need more details.
Thanks!
I have the same issue:
test code:
#!/usr/bin/env ruby
require 'psych'
require 'pp'
pp Psych.load_file(ARGV.first)
bad.yaml contains (note the single space before abc):
abc: 123
xyz: 456
and produces a unsatisfying result. where's xyz?:
{"abc"=>123}
good.yaml contains:
abc: 123
xyz: 456
and produces the expected result:
{"abc"=>123, "xyz"=>456}
This mal-feature is such that if you start a file/text-stream with N spaces, then after it encounters a line without the N leading-spaces, the remainder of the text is ignored.
abc: def
biz: buzz
buzz: biz
axe: hatchet
birds: fowl
xyz: zyx
produces:
{"abc"=>"def", "biz"=>"buzz"}
Switching gears, i tried the same files with PERL's yaml reader using this script.
#!/usr/bin/env perl
use YAML;
use Data::Dumper;
my $config = YAML::LoadFile(@ARGV[0]);
print Dumper($config);
from above examples, bad.yaml produces:
Code: YAML_PARSE_ERR_INCONSISTENT_INDENTATION
Line: 2
Document: 1
at /usr/local/share/perl5/YAML/Loader.pm line 736.
and good.yaml produces:
$VAR1 = {
'abc' => '123',
'xyz' => '456'
};
I feel like Psych should raise an exception for this broken input instead of silently ignoring the remainder of the file.
Further diagnosis:
#parse_stream correctly throws a syntax error:
2.1.5 :257 > Psych.parse_stream(" a: 1\nb: 2")
Psych::SyntaxError: (<unknown>): did not find expected <document start> at line 1 column 1
from /home/ben/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/psych.rb:373:in `parse'
from /home/ben/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/psych.rb:373:in `parse_stream'
from (irb):257
from /home/ben/.rvm/rubies/ruby-2.1.5/bin/irb:11:in `<main>'
however #parse does not.
2.1.5 :258 > Psych.parse(" a: 1\nb: 2")
=> #<Psych::Nodes::Document:0x00000000a3fb10 @children=[#<Psych::Nodes::Mapping:0x00000000a3fa98 @children=[#<Psych::Nodes::Scalar:0x00000000a3f9d0 @value="a", @anchor=nil, @tag=nil, @plain=true, @quoted=false, @style=1>, #<Psych::Nodes::Scalar:0x00000000a3f8b8 @value="1", @anchor=nil, @tag=nil, @plain=true, @quoted=false, @style=1>], @anchor=nil, @tag=nil, @implicit=true, @style=1>], @version=[], @tag_directives=[], @implicit=true, @implicit_end=true>
from my layman's perspective, something funky seems to be happening with the return in Psych.parse which appears to eat the exception? Using a return inside a block seems a little odd to me, and in most cases i can't figure out how the fallback value is used here, unless somehow the block never gets called.
def self.parse yaml, filename = nil, fallback = false
parse_stream(yaml, filename) do |node|
return node
end
fallback
end
@tenderlove maybe someone else can take a look at this? I isolated the problem, but am not sure how to fix it without breaking other stuff. I feel like the current behavior (die silently) is a serious flaw though.
Strings can contain multiple documents, and it looks like libyaml thinks this particular thing is two documents. If I try like this:
ruby -I lib -r psych -e'Psych.parse_stream(" a: 1\nb: 2") { |x| p x }'
The output is this:
#<Psych::Nodes::Document:0x00007fac1d026888 @children=[#<Psych::Nodes::Mapping:0x00007fac1d026608 @children=[#<Psych::Nodes::Scalar:0x00007fac1d026248 @value="a", @anchor=nil, @tag=nil, @plain=true, @quoted=false, @style=1, @start_line=0, @start_column=1, @end_line=0, @end_column=2>, #<Psych::Nodes::Scalar:0x00007fac1d026068 @value="1", @anchor=nil, @tag=nil, @plain=true, @quoted=false, @style=1, @start_line=0, @start_column=4, @end_line=0, @end_column=5>], @anchor=nil, @tag=nil, @implicit=true, @style=1, @start_line=0, @start_column=1, @end_line=1, @end_column=0>], @version=[], @tag_directives=[], @implicit=true, @implicit_end=true>
Traceback (most recent call last):
2: from -e:1:in `<main>'
1: from /Users/aaron/git/psych/lib/psych.rb:380:in `parse_stream'
/Users/aaron/git/psych/lib/psych.rb:380:in `parse': (<unknown>): did not find expected <document start> at line 1 column 1 (Psych::SyntaxError)
The first document parses, the second one does not.
Psych.load only streams and loads the first document as it doesn't know how to merge documents. I could have Psych.load raise an exception if there are multiple documents, but I'm not really sure what to do.
I checked with PyYaml and it also detects and errors on faulty indentation - isn't it also using libyaml? Perhaps worth looking at how it arrives at the indentation being an error. (Here is our ticket https://tickets.puppetlabs.com/browse/PUP-8547 raised by puppet users).
Confirm. The problem turned into painful consequences.
Below is the example which gets silently parsed into { a:1, b: 2} what is an obvious bug.
--
#someobj:
a: 1
b: 2
somevar: 1
My suggestion:
- Fail if multiple documents are detected on
#load. - Add additional
#load_allAPI to mimic the originalYaml::load()andYaml::loadAll().