yq icon indicating copy to clipboard operation
yq copied to clipboard

yq 3.3.1 is breaking an original indentation level for list items

Open posquit0 opened this issue 5 years ago • 8 comments

Describe the bug yq 3.3.1 is breaking an original indentation level for list items

Input Yaml Concise yaml document(s) (as simple as possible to show the bug) data1.yml:

this:
- a
- b
- c

Command The command you ran:

yq write data1.yml

Actual behavior

this:
  - a
  - b
  - c

Expected behavior

this:
- a
- b
- c

Additional context Add any other context about the problem here.

posquit0 avatar Jun 12 '20 06:06 posquit0

Yeah this is a result of updating to the latest https://godoc.org/gopkg.in/yaml.v3 package (which is the underlying yaml parser).

It used to have an issue the other way around (if you had indentation, it would remove it) - but many people complained and so now it does this. For what it's worth, I actually think this is better - but yeah it doesn't preserve formatting perfectly.

mikefarah avatar Jun 12 '20 06:06 mikefarah

Do you mean this behavior is the expected result? :)

posquit0 avatar Jun 18 '20 06:06 posquit0

No - just a known issue.

I've added it to the readme too, perhaps future version of yaml.v3 will fix it

mikefarah avatar Sep 16 '20 05:09 mikefarah

Thank you! 👍🏼

posquit0 avatar Sep 16 '20 05:09 posquit0

The YAML spec does seem to suggest that the un-indented form is canonical (under 2.3) https://yaml.org/spec/1.2.2/#21-collections

robin-wayve avatar May 31 '22 13:05 robin-wayve

@mikefarah, I have a generalized AST-based solution for this problem and have built it out enough to satisfy my use case (still have some edge cases to burn down where the renderer un-chomps multiline strings and reformats inline maps and sequences, but indentation is perfect).

Would you accept a contribution, e.g. the API proposed in #1359? It is a great deal of added complexity but can be isolated.

stristr avatar Jan 06 '24 19:01 stristr

Great deal of complexity doesn't sound great...I'd be reluctant to add anything too complex in, as it would be something I would have to support going forward.

yq has its own AST, is it using that? Can I have a look at the code?

mikefarah avatar Jan 07 '24 06:01 mikefarah

Great deal of complexity doesn't sound great...I'd be reluctant to add anything too complex in, as it would be something I would have to support going forward.

Fair! I can elaborate a bit more about why I think it's very important for yq to have this feature—succinctly, yq is not suitable as a "Swiss army knife utility" for large-scale changes if using it results in large-scale changes that consist largely of whitespace. yq is probably the best YAML tool out there in terms of fidelity to original YAML document, but the value of its ability to mutate YAML in place is weakened by its behavior in the face of idiosyncratic YAML formatting conventions in human-authored YAML. The option to set indentation helps, but many users have shared examples of cases where indentation alone is not sufficient.

this:
 is:
         - valid
         - yaml:
           - unfortunately: for us

Can I have a look at the code?

So to be clear, I've written a postprocessor of yq stdout that takes yq input and restores most of its original formatting including indentation. It's currently in Python and based on PyYAML AST, which was simply quickest for the project I'm working on but not the ideal final destination for such an algorithm. I'll describe the algo at a high level rather than burden you with the code:

  • Analyze old AST with depth-first search, tracking the indentation of each node against its JSON path. There's a bit of nuance here because maps, sequences, and scalar nodes all have whitespace-sensitive concerns depending on how they're specified, but I have access to a massive regression suite of human-authored YAML and feel quite confident this analysis is sound, complete, and testable.
  • Traverse new AST depth-first, mutating output along the way to restore original formatting to preexisting JSON paths and falling back to sensible defaults for JSON paths that didn't exist before or have fundamentally changed their semantics. The tricky things here are vast due to multiline strings, comment- and whitespace-only lines, trailing whitespaces on unchanged lines, single-line string quote style, and inline maps/sequences.

This can be implemented as a "best effort" algorithm that responsibly bails with a warn log if any values of the rendered YAML AST change at all, safeguarding against potential defects.

I haven't yet dug into yq to understand the optimal integration point. If you think gopkg.in/yaml.v3 is the optimal integration point and this is a pass-through API, I would be quite open to that but figured I'd start here where I'd like to pass a flag like --preserve-indent as proposed by another user in #1359.

stristr avatar Jan 07 '24 19:01 stristr