yq
                                
                                
                                
                                    yq copied to clipboard
                            
                            
                            
                        yq 3.3.1 is breaking an original indentation level for list items
Describe the bug yq 3.3.1 is breaking an original indentation level for list items
Input Yaml Concise yaml document(s) (as simple as possible to show the bug) data1.yml:
this:
- a
- b
- c
Command The command you ran:
yq write data1.yml
Actual behavior
this:
  - a
  - b
  - c
Expected behavior
this:
- a
- b
- c
Additional context Add any other context about the problem here.
Yeah this is a result of updating to the latest https://godoc.org/gopkg.in/yaml.v3 package (which is the underlying yaml parser).
It used to have an issue the other way around (if you had indentation, it would remove it) - but many people complained and so now it does this. For what it's worth, I actually think this is better - but yeah it doesn't preserve formatting perfectly.
Do you mean this behavior is the expected result? :)
No - just a known issue.
I've added it to the readme too, perhaps future version of yaml.v3 will fix it
Thank you! 👍🏼
The YAML spec does seem to suggest that the un-indented form is canonical (under 2.3) https://yaml.org/spec/1.2.2/#21-collections
@mikefarah, I have a generalized AST-based solution for this problem and have built it out enough to satisfy my use case (still have some edge cases to burn down where the renderer un-chomps multiline strings and reformats inline maps and sequences, but indentation is perfect).
Would you accept a contribution, e.g. the API proposed in #1359? It is a great deal of added complexity but can be isolated.
Great deal of complexity doesn't sound great...I'd be reluctant to add anything too complex in, as it would be something I would have to support going forward.
yq has its own AST, is it using that? Can I have a look at the code?
Great deal of complexity doesn't sound great...I'd be reluctant to add anything too complex in, as it would be something I would have to support going forward.
Fair! I can elaborate a bit more about why I think it's very important for yq to have this feature—succinctly, yq is not suitable as a "Swiss army knife utility" for large-scale changes if using it results in large-scale changes that consist largely of whitespace. yq is probably the best YAML tool out there in terms of fidelity to original YAML document, but the value of its ability to mutate YAML in place is weakened by its behavior in the face of idiosyncratic YAML formatting conventions in human-authored YAML. The option to set indentation helps, but many users have shared examples of cases where indentation alone is not sufficient.
this:
 is:
         - valid
         - yaml:
           - unfortunately: for us
Can I have a look at the code?
So to be clear, I've written a postprocessor of yq stdout that takes yq input and restores most of its original formatting including indentation. It's currently in Python and based on PyYAML AST, which was simply quickest for the project I'm working on but not the ideal final destination for such an algorithm. I'll describe the algo at a high level rather than burden you with the code:
- Analyze old AST with depth-first search, tracking the indentation of each node against its JSON path. There's a bit of nuance here because maps, sequences, and scalar nodes all have whitespace-sensitive concerns depending on how they're specified, but I have access to a massive regression suite of human-authored YAML and feel quite confident this analysis is sound, complete, and testable.
 - Traverse new AST depth-first, mutating output along the way to restore original formatting to preexisting JSON paths and falling back to sensible defaults for JSON paths that didn't exist before or have fundamentally changed their semantics. The tricky things here are vast due to multiline strings, comment- and whitespace-only lines, trailing whitespaces on unchanged lines, single-line string quote style, and inline maps/sequences.
 
This can be implemented as a "best effort" algorithm that responsibly bails with a warn log if any values of the rendered YAML AST change at all, safeguarding against potential defects.
I haven't yet dug into yq to understand the optimal integration point. If you think gopkg.in/yaml.v3 is the optimal integration point and this is a pass-through API, I would be quite open to that but figured I'd start here where I'd like to pass a flag like --preserve-indent as proposed by another user in #1359.