yq icon indicating copy to clipboard operation
yq copied to clipboard

yq write strips completely blank lines from the output

Open scanfield opened this issue 3 years ago • 28 comments

Is your feature request related to a problem? Please describe.

foo:
  bar: 1

  baz: 2

when run through yq w - foo.baz 3

produces

foo:
  bar: 1
  baz: 3

Describe the solution you'd like Keep my extra blank line (it's better for readability / produces less of a diff)

scanfield avatar Aug 13 '20 04:08 scanfield

Same story, sorry but this issue looks more like a bug nether then enhancement. When you process yaml file with yq it corrupts a whole file

warder avatar Sep 24 '20 05:09 warder

Any update on this, this is a really nice feature.

AceHack avatar Oct 10 '20 00:10 AceHack

Any update on this ? It's getting difficult when it comes to readability

sathiyams avatar Nov 23 '20 11:11 sathiyams

This is an effect of the underlying yaml parser, an issue was raised there https://github.com/go-yaml/yaml/issues/627 - the owner said

..the content when re-encoded will not
 have its original textual representation preserved. An effort is made to
 render the data plesantly, and to preserve comments near the data they
 describe, though. 

mikefarah avatar Nov 23 '20 23:11 mikefarah

I've been dealing with this issue for a couple of days when updating very large YAML files and found a workaround using diff & patch commands that restores the stripped blank lines in most of the cases. Suppose you have the following YAML file:

doc:
  version: 1.0.0
  name: numbers & letters

numbers:
  - 1

letters:
  - a

we call this file a.yaml. Now let's update the version using yq and store the result in new file a-updated.yaml:

yq e '.doc.version = "1.0.1"' a.yaml > a-updated.yaml

as expected, command above stripped all blank lines so a-updated.yaml looks like:

doc:
  version: 1.0.1
  name: numbers & letters
numbers:
  - 1
letters:
  - a

at this point, the first step to get the blank lines back is to create a diff file that ignores blank lines changes:

diff -U0 -w -b --ignore-blank-lines a.yaml a-updated.yaml > a.diff

a.diff looks like this:

--- a.yaml	2021-04-30 15:28:38.000000000 -0500
+++ a-updated.yaml	2021-04-30 15:18:53.000000000 -0500
@@ -2 +2 @@
-  version: 1.0.0
+  version: 1.0.1

then final step is to patch original file with the diff:

patch a.yaml < a.diff

after that, the original file looks like:

doc:
  version: 1.0.1
  name: numbers & letters

numbers:
  - 1

letters:
  - a

the issue comes when the updated line is right before a blank line. For example, let's add an element to one of the arrays:

yq e '.numbers += 2' a.yaml > a-updated.yaml

the updated file is now:

doc:
  version: 1.0.1
  name: numbers & letters
numbers:
  - 1
  - 2
letters:
  - a

if we generate the diff file as before we'll get the following:

--- a.yaml	2021-04-30 15:30:22.000000000 -0500
+++ a-updated.yaml	2021-04-30 15:35:26.000000000 -0500
@@ -7 +6 @@
-
+  - 2

and patching the original file with diff above results in:

doc:
  version: 1.0.1
  name: numbers & letters

numbers:
  - 1
  - 2
letters:
  - a

notice how the blank line after the new element in numbers array remains stripped while others are back. This is due since the diff file considers the blank line deletion and the addition of the new array element as part of the same diffset so it's not ignored by --ignore-blank-lines.

This is not ideal in any means but in my case it has helped a lot since my files are big and with lots of blank lines. I'm sharing this in case someone else can find it useful too.

arcesino avatar Apr 30 '21 20:04 arcesino

Thanks ! I use @arcesino approach for this 1 liner.

filename=xxx
version=xxx

patch "$filename" <<< $(diff -U0 -w -b --ignore-blank-lines $filename <(yq eval ".my.version = \"$version\"" $filename))

lirlia avatar Jan 25 '22 07:01 lirlia

Thanks for the idea with diff & patch @arcesino .

I my case the removal of blanks introduced by diff were unfortunately unacceptable, so i had to dig further.

And found a solution.

Approach is following: i remove blanks from the original yaml and create a diff between that and my altered yaml. The patch then is applied to the original and no new spaces are introduced.

Here an example:

Starting point is my original yaml where the value of key "secrets.TEST" should be updated

---
config:

  # mysql
  DATABASE_PROTOCOL: "mysql"
  # instance fqdn
  DATABASE_HOST: "mysql"

secrets:
  # db password
  DATABASE_PASSWORD: "password"

  # example
  TEST: "foo"

# other values
#[...]

Step 1: updating the value & creating a copy

yq '.secrets.TEST = "NewValue"' sample.yaml > sample.yaml.new

Step 2: removing blanks from the original

yq '.' sample.yaml > sample.yaml.noblanks

Step 3: creating a patch

diff -B sample.yaml.noblanks sample.yaml.new > patch.file

the patch contains then only the value diffs:

$> cat patch.file
11c11
<   TEST: "foo"
---
>   TEST: "NewValue"

Step 4: apply the patch to the original

patch sample.yaml patch.file

Here a screenshot:

image

Utils used:

  • yq 4.20.2
  • patch 2.7.6
  • diff 3.7

OS: debian 11

vladimir259 avatar Feb 25 '22 08:02 vladimir259

good idea! I turned that in a fish and bash functions in this Gist:

#fish
function yqblank;
  yq eval "$argv[1]" "$argv[2]" | diff -B "$argv[2]" - | patch "$argv[2]" -o -
end

#bash
yqblank() {
  yq eval $1 $2 | diff -B $2 - | patch $2 -o -
}

this makes it possible to use yq without changing (most) of the blank lines. usage as follows:

yqblank '.' file_name.yml

clementnuss avatar Apr 27 '22 05:04 clementnuss

@clementnuss I think patch $2 -o - does not work and -o should be removed there.

#bash
yqblank() {
  yq eval $1 $2 | diff -B $2 - | patch $2 -
}

raQai avatar Apr 29 '22 14:04 raQai

@clementnuss I think patch $2 -o - does not work and -o should be removed there.

@raQai, thank you! Just that the arguments have to be quoted properly, also eval/e can be omitted since yq 4.18.1:

#bash
yqblank() {
  yq "$1" "$2" | diff -B "$2" - | patch "$2" -
}

ryenus avatar Apr 30 '22 09:04 ryenus

Oh yeah, I forgot about the quote part :sweat_smile: Was on a hurry so thanks for adding this :+1:

edit: I would also like to add, that this still sometimes merges multi line descriptions and arrays into one and it is not able to properly handle comments.

source:
  fruits: [
    Apple,
    Banana,
    Calamansi,
  ]
becomes:
  fruits: [Apple, Banana, Calamansi,]
source:
  fruits: [
    Apple,     # comment 1
    Banana,    # comment 2
    Calamansi, # comment 3
  ]
becomes:
  fruits: [
    Apple, # comment 1
    Banana, # comment 2
    Calamansi, # comment 3
  ]

(I did not verify this on my current machine but that was roughly the result)

edit2: @arcesino we also ran into the same thing you did with the .info.version update.

Long story short: We still use yq but only to get the line of the .info.version using the line operator and update it using sed.

Something along those lines should work

$ sed -i "$(yq '.info.version | line' "$file")s/$old_val/$new_val/" "$file"

This also returns the correct line if the value of .info.version is broken to the next line

info:
  version: 1.x.x # line 2
info:
  version:
    1.x.x # line 3

raQai avatar Apr 30 '22 09:04 raQai

I'm hit by this too. No fix, only workarounds?

msdobrescu avatar Jul 15 '22 07:07 msdobrescu

Approach is following: i remove blanks from the original yaml and create a diff between that and my altered yaml. The patch then is applied to the original and no new spaces are introduced.

Unfortunately this only works for changes in already existed values. The patch would be with offsetted blank lines if try to add lines to the yaml.

I've already tested that and it does not work as expected for additions: https://github.com/andry81-devops/gh-workflow/blob/ee5d2d5b6bf59299e39baa16bb85357cf34a8561/bash/github/init-yq-workflow.sh https://github.com/andry81-devops/gh-workflow/blob/9b9d01a9b60a65d6c3c29f5b4b200409fc6a0aed/bash/cache/accum-content.sh

Search for: yq_edit, yq_diff, yq_patch

So, only the diff-versus-edited-yaml instead of diff-versus-unblanked-yaml looks reliable as @arcesino showed.

andry81 avatar Jul 29 '22 06:07 andry81

@arcesino

I've been dealing with this issue for a couple of days when updating very large YAML files and found a workaround using diff & patch commands that restores the stripped blank lines in most of the cases. Suppose you have the following YAML file:

This one has one disadvantage, it does remove comments. And there is no any way to completely correctly retain comments outside the yq utility, because the comments format depends on yaml syntax.

andry81 avatar Jul 29 '22 09:07 andry81

I've new implementation of bash scripts which is better of all above.

Implementation: https://github.com/andry81-devops/gh-workflow/blob/master/bash/github/init-yq-workflow.sh Example of usage: https://github.com/andry81-devops/gh-workflow/blob/master/bash/cache/accum-content.sh

# Usage example:
#
>yq_edit '<prefix-name>' 'edit' "<input-yaml>" "$TEMP_DIR/<output-yaml-edited>" \
  <list-of-yq-eval-strings> && \
  yq_diff "$TEMP_DIR/<output-yaml-edited>" "<input-yaml>" "$TEMP_DIR/<output-diff-edited>" && \
  yq_restore_edited_uniform_diff "$TEMP_DIR/<output-diff-edited>" "$TEMP_DIR/<output-diff-edited-restored>" && \
  yq_patch "$TEMP_DIR/<output-yaml-edited>" "$TEMP_DIR/<output-diff-edited-restored>" "$TEMP_DIR/<output-yaml-edited-restored>" "<output-yaml>"
#
# , where:
#
#   <input-yaml>  - input yaml file path
#   <output-yaml> - output yaml file path
#
#   <output-yaml-edited>          - output file name of edited yaml
#   <output-diff-edited>          - output file name of difference file generated from edited yaml
#   <output-diff-edited-restored> - output file name of restored difference file generated from original difference file
#   <output-yaml-edited-restored> - output file name of restored yaml file stored as intermediate temporary file

Example with test.yml:

# This file is automatically generated
#

content-index:

  timestamp: 1970-01-01T00:00:00Z

  entries:

    - dirs:

        - dir: dir-1/dir-2

          files:

            - file: file-1.dat
              md5-hash:
              timestamp: 1970-01-01T00:00:00Z

            - file: file-2.dat
              md5-hash:
              timestamp:

            - file: file-3.dat
              md5-hash:
              timestamp:

        - dir: dir-1/dir-2/dir-3

          files:

            - file: file-1.dat
              md5-hash:
              timestamp:

            - file: file-2.dat
              md5-hash:
              timestamp:
export GH_WORKFLOW_ROOT='<path-to-gh-workflow-root>' # https://github.com/andry81-devops/gh-workflow

source "$GH_WORKFLOW_ROOT/bash/github/init-yq-workflow.sh"

[[ -d "./temp" ]] || mkdir "./temp"

export TEMP_DIR="./temp"

yq_edit 'content-index' 'edit' "test.yml" "$TEMP_DIR/test-edited.yml" \
  ".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" && \
  yq_diff "$TEMP_DIR/test-edited.yml" "test.yml" "$TEMP_DIR/test-edited.diff" && \
  yq_restore_edited_uniform_diff "$TEMP_DIR/test-edited.diff" "$TEMP_DIR/test-edited-restored.diff" && \
  yq_patch "$TEMP_DIR/test-edited.yml" "$TEMP_DIR/test-edited-restored.diff" "$TEMP_DIR/test.yml" "test-patched.yml" || exit $?

PROs:

  • Can restore blank lines together with standalone comment lines: # ...
  • Can restore line end comments: key: value # ...
  • Can detect a line remove/change/add altogether.

CONs:

  • Because of has having a guess logic, may leave artefacts or invalid corrections.
  • Does not restore line end comments, where the yaml data is changed.

Related issues:

  • https://github.com/mikefarah/yq/issues/19
  • https://github.com/mikefarah/yq/issues/127
  • https://github.com/mikefarah/yq/issues/465
  • https://github.com/go-yaml/yaml/issues/627
  • https://github.com/yaml/libyaml/issues/42

andry81 avatar Aug 08 '22 05:08 andry81

Here is another possible workaround. We basically pre-format the file once with no content changes. Then make the content change. Then compare the pre-formatted and the content-changed versions to get a patch. Then apply the patch to the original file. I've only tried it for simple cases like patching the version in a helm values file. It seems to work well, and also seems to preserve comments.

$ yq --version
yq version 4.9.8
$ # The original file
$ cat values.yaml
# The app name
name: "some-app"

image:
  # The image tag
  tag: "1.2.0"

# Some other comments...
# ...
$ # Don't change anything; just let yq do its default formatting
$ yq eval --exit-status '.' values.yaml | tee out1.yaml
# The app name
name: "some-app"
image:
  # The image tag
  tag: "1.2.0"

# Some other comments...
# ...
$ # Now make the actual change
$ yq eval --exit-status '.image.tag = "1.3.0"' values.yaml | tee out2.yaml
# The app name
name: "some-app"
image:
  # The image tag
  tag: "1.3.0"

# Some other comments...
# ...
$ # Diff the two stripped files to get a minimal diff with no special flags.
$ diff out1.yaml out2.yaml | tee out.patch
5c5
<   tag: "1.2.0"
---
>   tag: "1.3.0"
$ # Apply the patch to the original file, which was unchanged so far.
$ patch values.yaml < out.patch
patching file values.yaml
$ # Inspect the final file. 
$ # Note the version was changed and everything else remained the same.
$ cat values.yaml
# The app name
name: "some-app"

image:
  # The image tag
  tag: "1.3.0"

# Some other comments...
# ...

alexklibisz avatar Feb 16 '23 20:02 alexklibisz

Here is another possible workaround. We basically just pre-strip the newlines and then re-compute the patch by comparing two stripped versions.

It has the same issues with comments and blanks remove.

andry81 avatar Feb 17 '23 03:02 andry81

Here is another possible workaround. We basically just pre-strip the newlines and then re-compute the patch by comparing two stripped versions.

It has the same issues with comments remove.

I think it works fine with comments. I updated my original post to include comments. LMK if you still see some issue. Maybe I'm overlooking something subtle.

alexklibisz avatar Feb 17 '23 16:02 alexklibisz

I think it works fine with comments. I updated my original post to include comments. LMK if you still see some issue. Maybe I'm overlooking something subtle.

The diff shows position in already edited file:

3c3 means change in 3d line, when actually has changed 6th line:

1: # The app name
2: name: "some-app"
3: 
4: image:
5:   # The image tag
6:   tag: "1.2.0"

Better to use uniform diff to see:

> diff -u out1.yaml out2.yaml | tee out-uniform.patch
--- out1.yaml
+++ out2.yaml
@@ -1,3 +1,3 @@
 name: some-app
 image:
-  tag: "1.2.0"
+  tag: "1.3.0"

To exploit:

values.yaml

# The app name
name: "some-app"

image1:
  # The image1 tag
  tag: "1.2.0"
image2:
  # The image2 tag
  tag: "1.2.0"
> yq -y '.image2.tag = "1.3.0"' values.yaml | tee out2.yaml
name: some-app
image1:
  tag: "1.2.0"
image2:
  tag: "1.3.0"
> patch values.yaml -i out.patch

out.patch

5c5
<   tag: "1.2.0"
---
>   tag: "1.3.0"

values.yaml

# The app name
name: "some-app"

image1:
  # The image1 tag
  tag: "1.3.0"
image2:
  # The image2 tag
  tag: "1.2.0"

This additionally shows why the non uniform diff even without default options is less stable for patching.

andry81 avatar Feb 18 '23 01:02 andry81

There will be any fixes to this issue in the future?

DavidAttar avatar Mar 29 '23 13:03 DavidAttar

It sounds like there's no workaround?

anthonyalayo avatar Apr 26 '23 07:04 anthonyalayo

prettier is the only yaml formatter I have tried that preserves blank lines correctly

Considering I switched to rome, it feels bit annoying though to have prettier installed just for it's ability to format yaml files :/

chrisgrieser avatar May 01 '23 10:05 chrisgrieser

It sounds like there's no workaround?

There are several workarounds mentioned throughout the thread. Look for 👍

alexklibisz avatar May 02 '23 15:05 alexklibisz

Micro-improvement to the workaround that leaves blank lines alone: I have some YAML files with comments preceded by two blanks, like the SemVer comments left by dependabot when you reference an action by its full commit hash, like

uses: rymndhng/release-on-push-action@aebba2bbce07a9474bf95e8710e5ee8a9e922fe2  # v0.25.0

These blanks also get squashed to just one when you use yq to modify something else.

To prevent, diff has an option -w to ignore all whitespace, resulting in

yq "$1" "$2" | diff -Bw "$2" - | patch "$2" -

bewuethr avatar Jun 02 '23 23:06 bewuethr

Hello @bewuethr, I have thoroughly tested the workaround you provided, and it demonstrates excellent functionality, effectively addressing the initial issue. However, I have observed that it does not preserve the newline character that exists after the line modification.

11,12c10
<   tag: ""
< 
---
>   tag: "1.0.0"

alita1991 avatar Jun 08 '23 16:06 alita1991

Hello @bewuethr, I have thoroughly tested the workaround you provided, and it demonstrates excellent functionality, effectively addressing the initial issue. However, I have observed that it does not preserve the newline character that exists after the line modification.

11,12c10
<   tag: ""
< 
---
>   tag: "1.0.0"

That's right, a blank line after a modified line gets removed! I haven't found a better workaround other than moving lines to modify away from a blank line, I'm afraid.

bewuethr avatar Jul 15 '23 01:07 bewuethr

There is an alternate underlying yaml library that claims to encode whitespace. This is a competitor to the library currently used in yq.

https://github.com/pantoniou/libfyaml

fulldecent avatar Nov 03 '23 04:11 fulldecent

This is dumb, but I'm just going to say it. If are are only using whitespace to separate sections and your sections each start with a comment like # some comment, then you can insert the whitespace back in with:

awk '/^---$/{flag=!flag; print; next} flag && /^#/{print ""} {print}'

fulldecent avatar Dec 06 '23 06:12 fulldecent