pyyaml
pyyaml copied to clipboard
PyYaml can't parse mappings with lists as keys
Here is an example of valid YAML from the YAML spec (example 2.11):
? [ New York Yankees,
Atlanta Braves ]
: [ 2001-07-02, 2001-08-12,
2001-08-14 ]
However, PyYaml can't parse this:
yaml.safe_load("""? [ New York Yankees,
Atlanta Braves ]
: [ 2001-07-02, 2001-08-12,
2001-08-14 ]""")
yaml.constructor.ConstructorError: while constructing a mapping
in "<unicode string>", line 1, column 1:
? [ New York Yankees,
^
found unhashable key
in "<unicode string>", line 1, column 3:
? [ New York Yankees,
^
This must be a bug, as it violates the spec. It could be fixed by parsing the list as a tuple.
So it appears to be parsing correctly, but I'm not aware of a generic way to use unhashable keys in a standard Python mapping (and neither is BaseConstructor, which is why it bombs)... As you point out, a specific hack to use a tuple could probably be used in the list case, but it's not a general solution to the problem. eg, try it with a mapping as a key, or any unhashable Python object ref, and you'll be right back in the same boat.
Someone brought this up a long time ago on a SO thread (https://stackoverflow.com/questions/13538015/sequence-as-key-of-yaml-mapping-in-python), but doesn't look like it really went anywhere.
Off the top of my head, the only thing I can think of would be to create a generic hashable container object for unhashable keys that could contain a reference to the actual data, but I'm not sure how useful that would be in the real-world. You wouldn't be able to use it for lookups against the returned data structure (since the "real" key is an artificial surrogate thing), so it would presumably only be useful by iteration. Would something like that solve your issue?
On a related note: it makes me feel slightly better that ruamel.yaml fails in exactly the same way. ;)
(PS, you should be able to subclass XConstructor and override construct_mapping to do the tuple thing or anything else you want today)
I did a PR for this last year: #159
[...] that
ruamel.yamlfails in exactly the same way.
really? It works for me here with ruamel. The only thing ruamel does not support is nested lists.
edit: here are the test results for that specific YAML: http://matrix.yaml.io/details/M5DY.html#ruamel-py
It would be cool if someone could have a look at my PR (#159). It still might need some work (what if we have circular aliases?), but if there's something wrong in general with it, I'd like to know.
It's a pity this hasn't moved for a while. I think it's as important functionality to correspond to the yaml specifications.
I agree that, at least for the reasonably simple cases presented, we should be able to handle representing list keys as tuples. I'm willing to work on this for v.next- #159 would need a bit of rework around the error handling to be more robust (and a couple more tests), but I think @perlpunk's underlying concept is sound. I'll add it to the planning project.
I think we need to be very clear though about what will work and what won't: this is only about substituting actual list-typed keys with tuples. Any other Python sequence type, if encountered in that situation, will continue to fail. I think the risk of backward-incompatible changes is otherwise pretty low, since the customization to BaseConstructor is limited to construct_mapping, which hardcodes a dict as the mapping type anyway, so anyone that's customized PyYAML to default to anything other than dict has already overridden this method on the constructor anyway.
I am currently running in this problem, as I need to parse YAML document containing lists as keys. Is this issue still active? I can see a PR was proposed a long time ago.
Same here
Just want to add to the chorus here. @perlpunk and @nitzmahone , if you need a pair of eyes or a little bit of effort, let me know and I'll jump in.