pyyaml
pyyaml copied to clipboard
ENH: Keep information about line numbers
It would be great for further processing and giving user-friendly feedback about semantic errors if there was a way to store metadata about loaded objects. This metadata could include:
- line number or origin
- original text representation
Actually there seems to be some stubby implementation over at StackOverflow already: https://stackoverflow.com/questions/13319067/parsing-yaml-return-with-line-number
This would be handy! Having spent a day trying to monkey patch a solution (and failing)...
I would love this, too!
If you don't mind working with PyYAML's internal data structures, here's a way:
import yaml
content = """- a
- { b: 0, c: 10 }
- d
"""
loader = yaml.Loader(content)
node = loader.get_single_node()
print(node)
Running the above piece of code will produce the following output (save for the formatting):
SequenceNode(tag='tag:yaml.org,2002:seq', value=[
ScalarNode(tag='tag:yaml.org,2002:str', value='a'),
MappingNode(tag='tag:yaml.org,2002:map', value=[
(
ScalarNode(tag='tag:yaml.org,2002:str', value='b'),
ScalarNode(tag='tag:yaml.org,2002:int', value='0')
), (
ScalarNode(tag='tag:yaml.org,2002:str', value='c'),
ScalarNode(tag='tag:yaml.org,2002:int', value='10')
)
]
),
ScalarNode(tag='tag:yaml.org,2002:str', value='d')
]
)
Also see:
- SafeLoader: https://github.com/yaml/pyyaml/blob/master/lib/yaml/loader.py#L31
- load(): https://github.com/yaml/pyyaml/blob/master/lib/yaml/init.py#L74
- get_single_data(): https://github.com/yaml/pyyaml/blob/master/lib/yaml/constructor.py#L47
- get_single_node(): https://github.com/yaml/pyyaml/blob/master/lib/yaml/composer.py#L29
(permanent links, like: github.com/yaml/pyyaml/blob/8cdff2c80573b8be8e8ad28929264a913a63aa33/lib/yaml/loader.py#L31)
@mathieucaroff
I think that's exactly the same as:
import yaml
content = """- a
- { b: 0, c: 10 }
- d
"""
node = yaml.compose(content)
print(node)
@ingydotnet
I think that's exactly the same as (...)
I agree, thanks! I find your solution, yaml.compose(x), to be a better way to obtain the node tree of the document. Indeed, it is shorter and cleaner as it finally .dispose()-s of the loader.
Yet another approach.
import yaml
class MyLoader(yaml.SafeLoader):
def __init__(self, stream):
super().__init__(stream)
self.locations = {}
def compose_node(self, parent, index):
node = super().compose_node(parent, index)
node._myloader_location = (self.line, self.column)
return node
def construct_object(self, node, deep=False):
obj = super().construct_object(node, deep=deep)
key = id(obj)
if key in self.locations:
self.locations[key] = None
else:
self.locations[key] = node._myloader_location
return obj
@classmethod
def load(cls, stream):
loader = cls(stream)
try:
return loader.get_single_data(), loader.locations
finally:
loader.dispose()
MyLoader.load(stream) returns a 2-tuple. The first element is the object that would be returned by a call to yaml.safe_load() (list, dictionary, scalar, None, etc.), and the second element is a dictionary containing the location information ((line, column) tuples) for most of the objects within the "normal" object, indexed by their ID.
For example:
>>> yml = "- foo\n- bar\n- baz\n"
>>> print(yml)
- foo
- bar
- baz
>>> doc, locations = MyLoader.load(yml)
>>> doc
['foo', 'bar', 'baz']
>>> locations[id(doc[0])]
(1, 0)
>>> locations[id(doc[1])]
(2, 0)
>>> locations[id(doc[2])]
(3, 0)
This does have 2 limitations.
First, if more than one occurrence of a singleton (such as None) is encountered, no location is returned for that object ID. (The location dictionary will map the singleton's ID to None.)
>>> yml = "-\n-\n-\n"
>>> print(yml)
-
-
-
>>> doc, locations = MyLoader.load(yml)
>>> doc
[None, None, None]
>>> print(locations[id(doc[0])])
None
>>> locations
{140414278062784: (3, 0), 140414513597984: None}
The second limitation of this approach is that the location information returned for sequences and mappings (Python lists and dictionaries) reflects the end of those objects, rather than the beginning.
>>> yml = "- foo\n- bar\n- baz\n"
>>> print(yml)
- foo
- bar
- baz
>>> doc, locations = MyLoader.load(yml)
>>> locations[id(doc)]
(3, 0)
Despite these limitations, I've still found that this approach can help improve error reporting when working with parsed YAML.