yaml-cpp option to not eat comments

hrm, I can't find any way to mark this as a feature request. anyways...

I'd like to preserve comments when I parse yaml. This library looks like it can 
write comments, but just eats them when it's reading files. Any chance of 
adding an option to include comment nodes somewhere?

Original issue reported on code.google.com by [email protected] on 9 Mar 2012 at 10:58

Mar 30 '15 02:03 GoogleCodeExporter

This is a nice idea, but I'm not sure how it would be presented in the API. For 
example,

- foo
# a comment
- bar

To which node should we attach the comment?

Perhaps it would have to be in the event API (which isn't public, yet), and 
then if you wanted them, you could handle that as well.

Or, do you simply want comments to be preserved so that when you output an 
already-parsed document, it prints the comments you started with?

Original comment by [email protected] on 10 Mar 2012 at 12:44

Added labels: ****
Removed labels: ****

Mar 30 '15 02:03 GoogleCodeExporter

I'd attach it to foo, I suppose.. so long as it's something consistent.

preserving the comments is the most important (I don't want to be losing data 
in the files I edit) but the ability to edit comments too would be useful.

Original comment by [email protected] on 10 Mar 2012 at 1:26

Added labels: ****
Removed labels: ****

Mar 30 '15 02:03 GoogleCodeExporter

I see, makes sense.

I'll mark this as "accepted", but it's a relatively large undertaking, so I'm 
not sure when I'll get to it.

Original comment by [email protected] on 10 Mar 2012 at 6:08

Changed state: Accepted
Added labels: ****
Removed labels: ****

Mar 30 '15 02:03 GoogleCodeExporter

thanks :)
the code looks readable, so maybe I could take a stab at it sometime.. we'll 
see... I need to get my basic node reading & writing done first of course :)

Original comment by [email protected] on 10 Mar 2012 at 7:45

Added labels: ****
Removed labels: ****

Mar 30 '15 02:03 GoogleCodeExporter

Interesting, because I needed this too (though I use Ruby mostly, not C++).

It has really annoyed me that even my top-only comments, which explain what 
this yaml file is good for, are gone when I use yaml (in Ruby though).

Just a general comment about # comments. :)

Original comment by [email protected] on 8 May 2012 at 10:03

Added labels: ****
Removed labels: ****

Mar 30 '15 02:03 GoogleCodeExporter

@shevegen, thanks for your input!

Original comment by [email protected] on 12 May 2012 at 12:43

Added labels: ****
Removed labels: ****

Mar 30 '15 02:03 GoogleCodeExporter

Original comment by [email protected] on 19 May 2012 at 9:10

Added labels: Component-Core
Removed labels: ****

Mar 30 '15 02:03 GoogleCodeExporter

I need this for C++ AND for separately for Python.

There are requests for this feature on almost all the various Yaml packages' 
issue tracking.  And the reason is simple - there are a lot of people who have 
human-edited configuration files, perhaps with comments, and wish to preserve 
those comments through (light) editing.

If you had this feature (and fixed its streaming issue 154, but everyone else 
seems to have a similar issue! :-D), yaml-cpp would be the "best of breed" for 
Yaml parsers.


I agree that it's not 100% clear in *all* cases which comment is attached to 
which component - but it is in most and simply arbitrarily deciding and 
documenting for edge cases would be perfectly fine, users would soon learn.

This would be a very useful feature for me and I'd be willing to devote some 
time over the next few months to helping make sure this got done. 

Here's how IMHO to proceed.

0. Set up some limited mailing list or just a CC list of people who were 
interested.
1. Decide on the comment association rules - "which Yaml element is this 
comment associated with?"
2. Decide abstractly (through discussion) where we are going to put the 
comments in the API - they'll have to be "optional" somehow for backward 
compatibility.
3. Actually change the API to have these fields, without actually implementing 
them.
4. Implement the reading and writing parts bit at a time.
5. Profit!


Feel free to contact me at tom (at) swirly dot com.

Original comment by [email protected] on 14 Apr 2013 at 11:44

Added labels: ****
Removed labels: ****

Mar 30 '15 02:03 GoogleCodeExporter

Two more things occurred to me.

The first is that you could keep extra whitespace with the comments.  No reason 
not to, and then the program could do "round trips" - i.e. where you convert 
from Yaml and then back and have a file that was byte for byte identical.

Regarding the comment association rules, it seems to me that if we simply say 
that the comment is always attached to the thing "right before it", there's 
always only one way to do it, and that way is nearly always the intuitive 
solution too.  In other words, we move backward to find the smallest complete 
thing, and attach the comment to that.

And this results also in a neat algorithm!  When you get to the end of a token, 
you just sweep up all the whitespace and carriage returns and comments, 
everything after it until the next token, and attach them to the address of the 
first token - and then when it comes time to emit that token, you just drop 
this all out again "as is".

Easy to implement, but also really easy to explain to users...!

Original comment by [email protected] on 15 Apr 2013 at 4:22

Added labels: ****
Removed labels: ****

Mar 30 '15 02:03 GoogleCodeExporter

Don't all rush up at once!  :-)

I have some spare time in May so if nothing comes of this, I might fork the 
repo and make the change.  This is not a guarantee, however...

Original comment by [email protected] on 19 Apr 2013 at 6:06

Added labels: ****
Removed labels: ****

Mar 30 '15 02:03 GoogleCodeExporter

Sorry, I've been away for a bit.

I'll consider this, but I don't think I want to offer perfect in/out round 
trips. I don't want yaml-cpp to output nasty YAML. But I do see value in 
keeping comments, so I'll think about it.

Original comment by [email protected] on 3 May 2013 at 1:09

Added labels: ****
Removed labels: ****

Mar 30 '15 02:03 GoogleCodeExporter

Has a decision been made regarding attaching comments to nodes during parsing? Edit: Or alternatively, treating them as fake nodes, given maps in yaml-cpp now store their parsed order. For example, they could be inserted into sequences easily enough, or into node_map as a special pair. This behavior could be gated by an option so it doesn't break existing code. Initial feeling is this would be more of a hack than a solution.

Additionally, it seems odd that there is no OnComment event handler.

Oct 14 '16 21:10 Mortal42

I haven't considered this issue in a while, but I'm happy to accept patches. I think an event handler for comments would be pretty uncontroversial.

Oct 14 '16 22:10 jbeder

I also think this feature would be useful. It would be nice to be able to preserve comments the user put into a file when rewriting the file (as stated above).

Jul 17 '17 15:07 tepperly

@jbeder, I'm trying to see about adding comments. Reading through concerns above and in other related tickets it looks like folks expect a comment to come after the thing it describes, but I think it may make sense to have both a "pre" and "post" comment. For example.

# comment before a sequence
- first item
- # comment before a scalar
  second item # trailing comment
              # this is where it gets confusing
              # I guess it could be possible to detect equivalent Mark.colum of consecutive
              # comments to determine that it appears as though someone is continuing.
# in my mind, this is clearly a comment describing the last item, but is inconsistent
# with where the comment for the "second item" was placed
- last item

I guess I have a couple questions, then I'll probably have more should I make any progress.

At one point you stated if it was a part of the Event system it wouldn't be that big of a deal. I don't quite understand how yaml-cpp would be able to re-emit without the data stored on the node_data. Am I missing something, or will comments need to be stored on node_data? Alternatively they could be their own nodes, but that loses context. Perhaps the idea is that two NodeBuilders could be created, one that uses comments and one that ignores.
Assuming that a comment should be attached to the previous node, what is the proper way to retrieve the previous node? Is it NodeBuilder.m_stack.back()?
If a comment before a node should reside on the following node, any ideas where to store the previous comment until a node exists?

p.s. I am a huge fan of "I don't want yaml-cpp to output nasty YAML." And if works, one might be able to create "yaml-format" utilizing "yaml-cpp".

Mar 21 '18 05:03 SimplyKnownAsG

Any progress on this one? The ability to edit commented YAML-files would be awesome.

@SimplyKnownAsG From what I can see, Nodes in yaml-cpp are either scalars or collections (mappings or sequences). Because of that yaml-cpp can easily represent things like complex keys (keys that aren't scalars) by implementing the subscript operator like this:

Node operator[](const Key& key);

This is very nice and it makes Node versatile. Saying that Node can be a comment would lead to absurdities, at least for the above reason that comments cannot act as a key in a mapping. There are other reasons, such as the fact that comments can be placed between a key and a value, or that it can be placed before a document even starts (before a "---").

Excluding that idea would leave us with (at least) the second solution, to make comments a part of Node-objects. I'm planning to take a look at the code some more and see if/how this could be done.

Wouldn't it be good to extend this feature to include empty lines as well? Because those are also part of the readability of a document.

Mar 12 '20 10:03 mazen-mardini

Would be great feature!

May 27 '20 13:05 nikich340

I also need this feature. For me this is important because I serialize only one yaml-node to send to a server and there I verify the configuration consistency, because the client and the host execution should be independently.

Here in this example I send to the server the node["configurationToSendServer"] and its mark positions, that I can handle the error positions. This works perfect if I don't have comments inside the configurationToSendServer. In the case that we have the comments, the node serialized and sent to the server is ate and will not appear in the server side, causing a wrong Mark information.

name: example
description: "some decription"
configurationToSendServer:
    name: execution
    # comment to describe something
    # the next line is wrong configuration for the server
    serverProperty: foobarWrongProperty

Dec 16 '20 11:12 filipebeavis

A lot of work, but starting with taking care of the simplest case, i.e., separate comment lines above the actual thing, would help a lot!

Nov 13 '21 04:11 seisowl

This is very nice and it makes Node versatile. Saying that Node can be a comment would lead to absurdities, at least for the above reason that comments cannot act as a key in a mapping. There are other reasons, such as the fact that comments can be placed between a key and a value, or that it can be placed before a document even starts (before a "---").

I think it can be workable. You simply devine a new value type, a "CommentKey". The key could simply be a unique string, for example "#124" for a comment on line 124. You could similarly have a Comment value, or it could simply be a scalar. The nice thing here is that by containing whitespace and comments in the tree, you can emit the tree back out as it came in, after modifying things.

I would personally like it so that Nodes could have comments attached to them if the comment appears on the same line; this could simply be an attribute in the YAML node, and can be added to the tree structure. I like this, because I can construct a template configuration tree with comments, unify it with the user config, then recreate it (without having to muck with the Emitter)

Jun 29 '22 14:06 nathanieltagg

This is very nice and it makes Node versatile. Saying that Node can be a comment would lead to absurdities, at least for the above reason that comments cannot act as a key in a mapping. There are other reasons, such as the fact that comments can be placed between a key and a value, or that it can be placed before a document even starts (before a "---").

I am inclined to agree with this. Comments cannot be a value, especially with lists:

- first item
- # comment
  second item # trailing comment

If we treat the comment as a value, then I suppose there are actually 4 items here: ["first item", # comment, "second item", # trailing comment], but the actual YAML is only 2 ['first item', 'second item'].

Instead, I think the Node needs to own the comments associated with it. I would store them in a few strings: before, inline, and after, or head, text, and tail (etc. regarding the naming):

# Introductory Comment (which still belongs to data as the first item)
# ...
# The following white space is preserved as part of the comment

# The data
data: # Still belongs to the data entry
    # dict head/leading comment
    ? # dict inline comment
      dict # dict trailing comment
    : # 1234 leading comment
      1234 # 1234 trailing comment

# Trailing comment of the root node of the document

Are there situations where this sort of division can not be ideal - sure. Commenting out an item from the end of a list, for example, will add it to the beginning of the next item. But

The division can be made "smarter" later, while still keeping the divisions I'm proposing.
When adding/changing content the comment will still generally be there, though an extra key/value could be inserted between it's final location and where it ends up. Only when removing content do we risk deleting the comments, but we're already deleting content so it's still a significant improvement to only delete a couple comments vs eating all of them.

Dec 13 '22 18:12 SirNate0

yaml-cpp yaml-cpp copied to clipboard

option to not eat comments

yaml-cpp
yaml-cpp copied to clipboard