[Possible Parser Bug] Key entries with the token ','
Hello ruby maintainers of psych-yaml,
I may have found a small bug. Perhaps it is not a "real" bug, but the error reporting by psych was strange still, so I report it here.
First, so that you can reproduce it, here is the minimum example to demonstrate the problem:
---
bla: bar
,abc: foo
Edit: Seems github wants markdown parsing, so the line before bla: is actually three times - which I think is the default header for yaml files.
So the problem is the second entry, called ",abc", in particular the key ',' there. If this ',' token is removed, or if we put this in '' strings such as:
',abc': foo
Then it works.
If I load this right now, I get:
#<Psych::SyntaxError: (foo.yml): did not find expected key while parsing a block mapping at line 2 column 1>
The problem is that Psych actually reports the wrong line. It reports the error very early on.
How did I discover this problem?
I have a huge yaml file, about 20.000 lines of code. I was making such a change with a leading ',' token and forgot the next day where I made it.
So when I tried to load the .yml file, psych was pointing out that there was an error. But the message it showed me was not useful - the error was not on that particular line.
It took me some attempts to isolate the faulty lines, by cutting down on a copy of that big .yml file.
So what is my proposal to change this?
Well, I assume here that the specification is clear and that leading ',' tokens may not be correct. I don't know whether this is true or not, but let's first assume this is the case, as otherwise that behaviour has to change as well.
My gripe here is mostly with psych reporting the wrong line. It seems to instead report the very first entry found, rather than the line itself.
You can test this yourself. Simply add more correct lines to that .yml file, before the faulty line, and afterwards, yet psych will still only always claim that the error happens near the beginning of the yaml dataset, which I think is not ideal. The real error is only (!) on the line with the ',' token. At the least I believe this would be more useful to report this to the end user too.
You may of course say that the current behaviour of psych is correct, which may be true - but the current behaviour is not really very helpful to users who try to debug the problem at hand. At the least in my use case, it would have been better to specifically report which line was flawed. No clue if there is a good simple way to change this behaviour, but perhaps in the long run, if anyone thinks about making psych a bit more user-friendly, similar how the did-you-mean gem helped improve (some) error messages for ruby, then perhaps something similar could happen to psych.
I am using psych all the time now, used to use syck instead for many, many years, but psych seems to work better now - and perhaps being stricter than syck also was the right decision in the long run.