yaml-cpp
yaml-cpp copied to clipboard
option to not eat comments
hrm, I can't find any way to mark this as a feature request. anyways...
I'd like to preserve comments when I parse yaml. This library looks like it can
write comments, but just eats them when it's reading files. Any chance of
adding an option to include comment nodes somewhere?
Original issue reported on code.google.com by [email protected]
on 9 Mar 2012 at 10:58
This is a nice idea, but I'm not sure how it would be presented in the API. For
example,
- foo
# a comment
- bar
To which node should we attach the comment?
Perhaps it would have to be in the event API (which isn't public, yet), and
then if you wanted them, you could handle that as well.
Or, do you simply want comments to be preserved so that when you output an
already-parsed document, it prints the comments you started with?
Original comment by [email protected]
on 10 Mar 2012 at 12:44
- Added labels: ****
- Removed labels: ****
I'd attach it to foo, I suppose.. so long as it's something consistent.
preserving the comments is the most important (I don't want to be losing data
in the files I edit) but the ability to edit comments too would be useful.
Original comment by [email protected]
on 10 Mar 2012 at 1:26
- Added labels: ****
- Removed labels: ****
I see, makes sense.
I'll mark this as "accepted", but it's a relatively large undertaking, so I'm
not sure when I'll get to it.
Original comment by [email protected]
on 10 Mar 2012 at 6:08
- Changed state: Accepted
- Added labels: ****
- Removed labels: ****
thanks :)
the code looks readable, so maybe I could take a stab at it sometime.. we'll
see... I need to get my basic node reading & writing done first of course :)
Original comment by [email protected]
on 10 Mar 2012 at 7:45
- Added labels: ****
- Removed labels: ****
Interesting, because I needed this too (though I use Ruby mostly, not C++).
It has really annoyed me that even my top-only comments, which explain what
this yaml file is good for, are gone when I use yaml (in Ruby though).
Just a general comment about # comments. :)
Original comment by [email protected]
on 8 May 2012 at 10:03
- Added labels: ****
- Removed labels: ****
@shevegen, thanks for your input!
Original comment by [email protected]
on 12 May 2012 at 12:43
- Added labels: ****
- Removed labels: ****
Original comment by [email protected]
on 19 May 2012 at 9:10
- Added labels: Component-Core
- Removed labels: ****
I need this for C++ AND for separately for Python.
There are requests for this feature on almost all the various Yaml packages'
issue tracking. And the reason is simple - there are a lot of people who have
human-edited configuration files, perhaps with comments, and wish to preserve
those comments through (light) editing.
If you had this feature (and fixed its streaming issue 154, but everyone else
seems to have a similar issue! :-D), yaml-cpp would be the "best of breed" for
Yaml parsers.
I agree that it's not 100% clear in *all* cases which comment is attached to
which component - but it is in most and simply arbitrarily deciding and
documenting for edge cases would be perfectly fine, users would soon learn.
This would be a very useful feature for me and I'd be willing to devote some
time over the next few months to helping make sure this got done.
Here's how IMHO to proceed.
0. Set up some limited mailing list or just a CC list of people who were
interested.
1. Decide on the comment association rules - "which Yaml element is this
comment associated with?"
2. Decide abstractly (through discussion) where we are going to put the
comments in the API - they'll have to be "optional" somehow for backward
compatibility.
3. Actually change the API to have these fields, without actually implementing
them.
4. Implement the reading and writing parts bit at a time.
5. Profit!
Feel free to contact me at tom (at) swirly dot com.
Original comment by [email protected]
on 14 Apr 2013 at 11:44
- Added labels: ****
- Removed labels: ****
Two more things occurred to me.
The first is that you could keep extra whitespace with the comments. No reason
not to, and then the program could do "round trips" - i.e. where you convert
from Yaml and then back and have a file that was byte for byte identical.
Regarding the comment association rules, it seems to me that if we simply say
that the comment is always attached to the thing "right before it", there's
always only one way to do it, and that way is nearly always the intuitive
solution too. In other words, we move backward to find the smallest complete
thing, and attach the comment to that.
And this results also in a neat algorithm! When you get to the end of a token,
you just sweep up all the whitespace and carriage returns and comments,
everything after it until the next token, and attach them to the address of the
first token - and then when it comes time to emit that token, you just drop
this all out again "as is".
Easy to implement, but also really easy to explain to users...!
Original comment by [email protected]
on 15 Apr 2013 at 4:22
- Added labels: ****
- Removed labels: ****
Don't all rush up at once! :-)
I have some spare time in May so if nothing comes of this, I might fork the
repo and make the change. This is not a guarantee, however...
Original comment by [email protected]
on 19 Apr 2013 at 6:06
- Added labels: ****
- Removed labels: ****
Sorry, I've been away for a bit.
I'll consider this, but I don't think I want to offer perfect in/out round
trips. I don't want yaml-cpp to output nasty YAML. But I do see value in
keeping comments, so I'll think about it.
Original comment by [email protected]
on 3 May 2013 at 1:09
- Added labels: ****
- Removed labels: ****
Has a decision been made regarding attaching comments to nodes during parsing? Edit: Or alternatively, treating them as fake nodes, given maps in yaml-cpp now store their parsed order. For example, they could be inserted into sequences easily enough, or into node_map as a special pair. This behavior could be gated by an option so it doesn't break existing code. Initial feeling is this would be more of a hack than a solution.
Additionally, it seems odd that there is no OnComment event handler.
I haven't considered this issue in a while, but I'm happy to accept patches. I think an event handler for comments would be pretty uncontroversial.
I also think this feature would be useful. It would be nice to be able to preserve comments the user put into a file when rewriting the file (as stated above).
@jbeder, I'm trying to see about adding comments. Reading through concerns above and in other related tickets it looks like folks expect a comment to come after the thing it describes, but I think it may make sense to have both a "pre" and "post" comment. For example.
# comment before a sequence
- first item
- # comment before a scalar
second item # trailing comment
# this is where it gets confusing
# I guess it could be possible to detect equivalent Mark.colum of consecutive
# comments to determine that it appears as though someone is continuing.
# in my mind, this is clearly a comment describing the last item, but is inconsistent
# with where the comment for the "second item" was placed
- last item
I guess I have a couple questions, then I'll probably have more should I make any progress.
- At one point you stated if it was a part of the Event system it wouldn't be that big of a deal. I don't quite understand how yaml-cpp would be able to re-emit without the data stored on the
node_data
. Am I missing something, or will comments need to be stored onnode_data
? Alternatively they could be their own nodes, but that loses context. Perhaps the idea is that two NodeBuilders could be created, one that uses comments and one that ignores. - Assuming that a comment should be attached to the previous node, what is the proper way to retrieve the previous node? Is it
NodeBuilder.m_stack.back()
? - If a comment before a node should reside on the following node, any ideas where to store the previous comment until a node exists?
p.s. I am a huge fan of "I don't want yaml-cpp to output nasty YAML." And if works, one might be able to create "yaml-format" utilizing "yaml-cpp".
Any progress on this one? The ability to edit commented YAML-files would be awesome.
@SimplyKnownAsG From what I can see, Nodes in yaml-cpp are either scalars or collections (mappings or sequences). Because of that yaml-cpp can easily represent things like complex keys (keys that aren't scalars) by implementing the subscript operator like this:
Node operator[](const Key& key);
This is very nice and it makes Node versatile. Saying that Node can be a comment would lead to absurdities, at least for the above reason that comments cannot act as a key in a mapping. There are other reasons, such as the fact that comments can be placed between a key and a value, or that it can be placed before a document even starts (before a "---").
Excluding that idea would leave us with (at least) the second solution, to make comments a part of Node-objects. I'm planning to take a look at the code some more and see if/how this could be done.
Wouldn't it be good to extend this feature to include empty lines as well? Because those are also part of the readability of a document.
Would be great feature!
I also need this feature. For me this is important because I serialize only one yaml-node to send to a server and there I verify the configuration consistency, because the client and the host execution should be independently.
Here in this example I send to the server the node["configurationToSendServer"] and its mark positions, that I can handle the error positions. This works perfect if I don't have comments inside the configurationToSendServer. In the case that we have the comments, the node serialized and sent to the server is ate and will not appear in the server side, causing a wrong Mark information.
name: example
description: "some decription"
configurationToSendServer:
name: execution
# comment to describe something
# the next line is wrong configuration for the server
serverProperty: foobarWrongProperty
A lot of work, but starting with taking care of the simplest case, i.e., separate comment lines above the actual thing, would help a lot!
This is very nice and it makes Node versatile. Saying that Node can be a comment would lead to absurdities, at least for the above reason that comments cannot act as a key in a mapping. There are other reasons, such as the fact that comments can be placed between a key and a value, or that it can be placed before a document even starts (before a "---").
I think it can be workable. You simply devine a new value type, a "CommentKey". The key could simply be a unique string, for example "#124" for a comment on line 124. You could similarly have a Comment value, or it could simply be a scalar. The nice thing here is that by containing whitespace and comments in the tree, you can emit the tree back out as it came in, after modifying things.
I would personally like it so that Nodes could have comments attached to them if the comment appears on the same line; this could simply be an attribute in the YAML node, and can be added to the tree structure. I like this, because I can construct a template configuration tree with comments, unify it with the user config, then recreate it (without having to muck with the Emitter)
This is very nice and it makes Node versatile. Saying that Node can be a comment would lead to absurdities, at least for the above reason that comments cannot act as a key in a mapping. There are other reasons, such as the fact that comments can be placed between a key and a value, or that it can be placed before a document even starts (before a "---").
I am inclined to agree with this. Comments cannot be a value, especially with lists:
- first item
- # comment
second item # trailing comment
If we treat the comment as a value, then I suppose there are actually 4 items here:
["first item", # comment, "second item", # trailing comment]
, but the actual YAML is only 2 ['first item', 'second item']
.
Instead, I think the Node needs to own the comments associated with it. I would store them in a few strings: before, inline, and after, or head, text, and tail (etc. regarding the naming):
# Introductory Comment (which still belongs to data as the first item)
# ...
# The following white space is preserved as part of the comment
# The data
data: # Still belongs to the data entry
# dict head/leading comment
? # dict inline comment
dict # dict trailing comment
: # 1234 leading comment
1234 # 1234 trailing comment
# Trailing comment of the root node of the document
Are there situations where this sort of division can not be ideal - sure. Commenting out an item from the end of a list, for example, will add it to the beginning of the next item. But
- The division can be made "smarter" later, while still keeping the divisions I'm proposing.
- When adding/changing content the comment will still generally be there, though an extra key/value could be inserted between it's final location and where it ends up. Only when removing content do we risk deleting the comments, but we're already deleting content so it's still a significant improvement to only delete a couple comments vs eating all of them.