aws-cli icon indicating copy to clipboard operation
aws-cli copied to clipboard

Inconsistent format with "--output table" and "--query"

Open elofu17 opened this issue 2 years ago • 8 comments

Describe the bug

When using the option --output table together with --query, no matter what query you make, if the result is just one (1) row, the table is formatted one way (faulty) but if the result is two lines or more (>2), the table is formatted correctly.

The bug is this inconsistency. You can't batch-process the results (like cut:ing a particular column or using sed to post-process the output) since their format dynamically differ.

Example of a single set (1) of output (faulty table output):

# aws rds describe-db-cluster-snapshots --query 'DBClusterSnapshots[*].{c1_Id:DBClusterSnapshotIdentifier,c2_Size:AllocatedStorage,c3_Time:SnapshotCreateTime} | sort_by(@, &c3_Time)' --output table
-----------------------------------------------------------------
|                  DescribeDBClusterSnapshots                   |
+---------+-----------------------------------------------------+
|  c1_Id  |  database-1-instance-1-eu-north-1b-final-snapshot   |
|  c2_Size|  0                                                  |
|  c3_Time|  2019-11-15T15:04:40.902000+00:00                   |
+---------+-----------------------------------------------------+

This is not what you expect from a table. The three columns in the single data set should be columns, not rows.

Example of the same query, but now with four (4) sets of output (correct table output):

# aws rds describe-db-cluster-snapshots --query 'DBClusterSnapshots[*].{c1_Id:DBClusterSnapshotIdentifier,c2_Size:AllocatedStorage,c3_Time:SnapshotCreateTime} | sort_by(@, &c3_Time)' --output table
-------------------------------------------------------------------------------------------------------
|                                    DescribeDBClusterSnapshots                                       |
+-----------------------------------------------------+----------+------------------------------------+
|                             c1_Id                   | c2_Size  |              c3_Time               |
+-----------------------------------------------------+----------+------------------------------------+
|  devdb-2023-03-08-05-45-snapshot                    |  6       |  2022-03-08T09:35:56.124000+00:00  |
|  devdb-1-restore-cluster-snapshot                   |  6       |  2022-03-08T10:03:21.712000+00:00  |
|  maindb-cluster-test-2-snapshot                     |  108     |  2022-03-21T11:57:34.037000+00:00  |
|  preupgrade-devdbs-10-21-to-12-12-2023-06-28-14-02  |  7       |  2023-06-28T14:07:56.890000+00:00  |
+-----------------------------------------------------+----------+------------------------------------+

This is correct. The output is formatted in a table as expected.

I redirect the output from the commands to a file, >> cluster-snapshots.txt. Later I run egrep '2019-|2020-|2021-|2022-' cluster-snapshots.txt and expect to see the name, size and snapshot-date of old (prior to the year 2023) db-snapshots that I should delete, but the result is broken:

|  c3_Time|  2019-11-15T15:04:40.902000+00:00                   |
|  devdb-2023-03-08-05-45-snapshot                    |  6       |  2022-03-08T09:35:56.124000+00:00  |
|  devdb-1-restore-cluster-snapshot                   |  6       |  2022-03-08T10:03:21.712000+00:00  |
|  maindb-cluster-test-2-snapshot                     |  108     |  2022-03-21T11:57:34.037000+00:00  |

Expected Behavior

The table should always have each field in a column, even if it is just one set (one row) of results. It is expected that the table output from a CLI tool is always consistent. The output from the first example should look like this:

# aws rds describe-db-cluster-snapshots --query 'DBClusterSnapshots[*].{c1_Id:DBClusterSnapshotIdentifier,c2_Size:AllocatedStorage,c3_Time:SnapshotCreateTime} | sort_by(@, &c3_Time)' --output table
-------------------------------------------------------------------------------------------------------
|                                    DescribeDBClusterSnapshots                                       |
+-----------------------------------------------------+----------+------------------------------------+
|                             c1_Id                   | c2_Size  |              c3_Time               |
+-----------------------------------------------------+----------+------------------------------------+
|  database-1-instance-1-eu-north-1b-final-snapshot   |  0       |  2019-11-15T15:04:40.902000+00:00  |
+-----------------------------------------------------+----------+------------------------------------+

Current Behavior

One (1) set of results are listed over multiple rows in a key:value style approach while multiple sets are listed as rows and columns.

Reproduction Steps

See examples above.

Possible Solution

No response

Additional Information/Context

aws-cli v1 and v2 has the same bug.

CLI version used

aws-cli v1 and the same thing in v2.

Environment details (OS name and version, etc.)

Debian linux 11

elofu17 avatar Sep 27 '23 23:09 elofu17

Hi @elofu17 - thanks for reaching out.

I attempted to reproduce with the command you ran and I'm seeing the table format correctly with one result or more, on aws-cli/2.13.0. Can I ask which CLI version are you using? I'll try to downgrade to CLI v1 and reproduce it to see if it outputs differently but we do recommend v2 to our users.

aws rds describe-db-cluster-snapshots --query 'DBClusterSnapshots[*].{c1_Id:DBClusterSnapshotIdentifier,c2_Size:AllocatedStorage,c3_Time:SnapshotCreateTime}' --output table
------------------------------------------------------------------------------------
|                            DescribeDBClusterSnapshots                            |
+----------------------------------+----------+------------------------------------+
|               c1_Id              | c2_Size  |              c3_Time               |
+----------------------------------+----------+------------------------------------+
|  rds:database-1-2023-09-29-11-58 |  0       |  2023-09-29T11:58:38.196000+00:00  |
+----------------------------------+----------+------------------------------------+
aws rds describe-db-cluster-snapshots --query 'DBClusterSnapshots[*].{c1_Id:DBClusterSnapshotIdentifier,c2_Size:AllocatedStorage,c3_Time:SnapshotCreateTime} | sort_by(@, &c3_Time)' --output table
------------------------------------------------------------------------------------
|                            DescribeDBClusterSnapshots                            |
+----------------------------------+----------+------------------------------------+
|               c1_Id              | c2_Size  |              c3_Time               |
+----------------------------------+----------+------------------------------------+
|  rds:database-1-2023-09-29-11-58 |  0       |  2023-09-29T11:58:38.196000+00:00  |
+----------------------------------+----------+------------------------------------+
aws rds describe-db-cluster-snapshots --query 'DBClusterSnapshots[*].{c1_Id:DBClusterSnapshotIdentifier,c2_Size:AllocatedStorage,c3_Time:SnapshotCreateTime} | sort_by(@, &c3_Time)' --output table
------------------------------------------------------------------------------------
|                            DescribeDBClusterSnapshots                            |
+----------------------------------+----------+------------------------------------+
|               c1_Id              | c2_Size  |              c3_Time               |
+----------------------------------+----------+------------------------------------+
|  rds:database-1-2023-09-29-11-58 |  0       |  2023-09-29T11:58:38.196000+00:00  |
|  snap                            |  1       |  2023-09-29T17:31:36.254000+00:00  |
+----------------------------------+----------+------------------------------------+

aBurmeseDev avatar Sep 29 '23 17:09 aBurmeseDev

Hi John! Thanks for investigating.

Ah! Now that you show the above, and I test the same thing, I too get the same results as you. It turns out I am always pipeing my output to e.g. less or redirecting the output to a file, and that's when the problem occurr. Sorry. One should always run exactly the same command as one states in ones own bug report. :-) Doh!

So updated bug report:

Without any pipe (stdout is an interactive terminal) it looks good:

# aws-vault exec foobar -- aws ec2 describe-volumes --filters Name=status,Values=available --query 'Volumes[*].{c1_Id:VolumeId,c2_Size:Size,c3_Time:CreateTime,c4_Snapshot:SnapshotId,c5_Tag_Name:Tags[?Key==`Name`].Value | [0]} | sort_by(@, &c3_Time)' --output table
---------------------------------------------------------------------------------------------------------------
|                                               DescribeVolumes                                               |
+-----------------------+----------+----------------------------+-------------------------+-------------------+
|         c1_Id         | c2_Size  |          c3_Time           |       c4_Snapshot       |    c5_Tag_Name    |
+-----------------------+----------+----------------------------+-------------------------+-------------------+
|  vol-12345678901234567|  10      |  2021-12-07T10:41:06.502Z  |  snap-12345678901234567 |  foobar1-dev-org  |
+-----------------------+----------+----------------------------+-------------------------+-------------------+

...but with a pipe or redirect, the output is faulty (when it is only one set):

# aws-vault exec foobar -- aws ec2 describe-volumes --filters Name=status,Values=available --query 'Volumes[*].{c1_Id:VolumeId,c2_Size:Size,c3_Time:CreateTime,c4_Snapshot:SnapshotId,c5_Tag_Name:Tags[?Key==`Name`].Value | [0]} | sort_by(@, &c3_Time)' --output table | less
---------------------------------------------
|              DescribeVolumes              |
+--------------+----------------------------+
|  c1_Id       |  vol-12345678901234567     |
|  c2_Size     |  10                        |
|  c3_Time     |  2021-12-07T10:41:06.502Z  |
|  c4_Snapshot |  snap-12345678901234567    |
|  c5_Tag_Name |  foobar1-dev-org           |
+--------------+----------------------------+

I'm using the latest version of both v1 and v2.

(if you are curious, the query is to list unused volumes laying around in AWS. I should really delete that one from 2021 :-) )

PS: I'm not upgrading to v2 until there's something in v2 I really need. This is because v2 is 33% slower than v1. :-( So I'm sticking to v1 for as long as it is maintained, due to performance.

elofu17 avatar Sep 30 '23 00:09 elofu17

@elofu17 - appreciate you for getting back and sharing what you found. I'm glad that you found the culprit and that the table formats came back correctly without outputing with less.

Since I was curious, I tested using less being on CLI v2 and on less 581.2, the table output formatted correctly. It comes down to either version issue or there's something else going on. To rule out version issue, what are the versions of CLI and less you're seeing the incorrect table format output on and I'll test?

aws rds describe-db-cluster-snapshots --query 'DBClusterSnapshots[*].{c1_Id:DBClusterSnapshotIdentifier,c2_Size:AllocatedStorage,c3_Time:SnapshotCreateTime} | sort_by(@, &c3_Time)' --output table | less
------------------------------------------------------------------------------------
|                            DescribeDBClusterSnapshots                            |
+----------------------------------+----------+------------------------------------+
|               c1_Id              | c2_Size  |              c3_Time               |
+----------------------------------+----------+------------------------------------+
|  snap                            |  0       |  2023-09-29T17:31:36.254000+00:00  |
|  rds:database-1-2023-10-02-12-10 |  0       |  2023-10-02T12:10:39.450000+00:00  |
+----------------------------------+----------+------------------------------------+

aBurmeseDev avatar Oct 03 '23 17:10 aBurmeseDev

Hi John!

Two things:

  1. less has nothing to do with it. You get the same problem if you simply redirect to a file (> mylogfile.txt ) or pipe the output to grep or whatever. I.e. the problem seem to only ocurr when stdout is an interactive shell/terminal. (see https://serverfault.com/questions/146745/how-can-i-check-in-bash-if-a-shell-is-running-in-interactive-mode for examples to check for it. I believe that somewhere in the aws-cli code there is a check to see if it is running in an interactive shell/terminal and if so format the columns differently than if writing to a file or to a pipe, and when doing so the output is wrong when there's only one (1) set of output.)

  2. Did you only do the test in the example above? 'Cause in that example you have two (2) rows of result, and it is expected to look correct. :-) The faulty table only show up when you have one (1) line of output (and the stdout is not an interactive shell/terminal).

# aws --version
aws-cli/1.29.56 Python/3.7.3 Linux/4.19.0-24-amd64 botocore/1.31.56
# aws-vault --version
v6.3.1
# bash --version
GNU bash, version 5.0.3(1)-release (x86_64-pc-linux-gnu)
readline v7.0-5
# echo $TERM
xterm
# stty --all
speed 38400 baud; rows 55; columns 208; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff -iuclc -ixany -imaxbel -iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke -flusho -extproc

elofu17 avatar Oct 03 '23 20:10 elofu17

Hi @elofu17 - thanks for following up.

  • I narrowed it down to one result and run it with less and that messes up the table format. aws rds describe-db-cluster-snapshots --query 'DBClusterSnapshots[*].{c1_Id:DBClusterSnapshotIdentifier,c2_Size:AllocatedStorage,c3_Time:SnapshotCreateTime} | sort_by(@, &c3_Time)' --output table | less
  • Also tried to output it to a file and that results similar faulty format aws rds describe-db-cluster-snapshots --query 'DBClusterSnapshots[*].{c1_Id:DBClusterSnapshotIdentifier,c2_Size:AllocatedStorage,c3_Time:SnapshotCreateTime} | sort_by(@, &c3_Time)' --output table > testFile.txt

To summarize and make sure we're on the same page, it only seems to occurs when stdout is an interactive shell/terminal AND when you have only one output result. I'll dig more into it and let you know once I find something.

Thanks for your patience.

aBurmeseDev avatar Oct 03 '23 21:10 aBurmeseDev

Nice!

Correct. The bug is triggered with the combination of --output table and --query and result is just one (1) set and stdout is an interactive shell

I have reported many bugs in my life, but such a lineup of conditions that have to match must be a record :-)

elofu17 avatar Oct 03 '23 22:10 elofu17

Hi @elofu17 - I wanted to post an update here that we've created a backlog item for the team to investigate and during the team discussion, we noticed your statement on CLI v2 performance below. Can you elaborate more on that? Is there anything that we're not aware of as far as performance between v1 and v2?

PS: I'm not upgrading to v2 until there's something in v2 I really need. This is because v2 is 33% slower than v1. :-( So I'm sticking to v1 for as long as it is maintained, due to performance.

aBurmeseDev avatar Oct 19 '23 22:10 aBurmeseDev

It's as simple as this:

I have a bash script that run various queries like the ones in my examples above. (describe EIPs, instances, snapshots, DBs, ELBs and internet/NAT/transit gateways = ca 10 queries) I loop over these queries for all my AWS accounts (ca 15 of them) = 10 * 15 = 150 aws-cli queries in total. The script takes 5 minutes and 10 seconds to run every time.

I replace aws-cli v1 (latest version) with v2 (latest version).

Now the script takes almost 7 minutes, every time.

When running aws-cli just once, it doesn't matter that v2 takes some milliseconds longer, but when you are sitting there waiting for 5 minutes -- then two additional minutes is quite a lot.

I have no idea (and haven't looked into it) why v2 is 33% slower, but if you can optimize it and speed things up, that would be great. :-) I have no idea about the inner workings, but perhaps you could add some commandline options where the users can disable various un-needed features, to optimize batch running? Like if aws-cli v2 always load library x y and z, but y and z aren't needed for my query, then I would like to be able to disable the loading of them, to optimize a few milliseconds here and there.

elofu17 avatar Oct 20 '23 23:10 elofu17