redpanda Adding stuctured output via new --format flag on most rpk commands.…

Cover letter

Adds json/yaml output to most rpk commands. I'm happy to do the rest but these are the ones I needed to build puppet config management around rpk. I also wanted to get the team's opinion before spending more time as I'm still learning Go and may be doing this incredibly inefficiently. I'm also not sure if dev is the right target for the PR, even if it was accepted. I'm happy for any and all comments/criticisms to make things better.

Backport Required

[ ] not a bug fix
[ ] issue does not exist in previous branches
[x] papercut/not impactful enough to backport
[ ] v22.2.x
[ ] v22.1.x
[ ] v21.11.x

UX changes

Add --format param to most rpk commands to format output as text, json, or yaml, with text being the default, which is the existing table style output that was already how rpk worked.

Release notes

Added --format flag to most rpk command output that supports text/json/yaml as argument, defaults to text.

Features

New --format param to most prk commands to support yaml/json output.

Improvements

Allows monitoring and config management tools to wrap around the rpk command vs having to re-implement direct rest/kafka api clients to accomplish task the rpk cli already fully supports.

Nov 09 '22 16:11 alnet

All committers have signed the CLA.

Nov 09 '22 16:11 CLAassistant

Rebased from latest dev to make sure merging is clean, assuming the branch gets accepted.

Nov 13 '22 15:11 alnet

I think I've got vscode properly configured to auto format on save now. Sorry about messing up simple linter stuff so many times.

Nov 16 '22 15:11 alnet

The last two failing checks mention "Sign-in with your Redpanda Data Inc. account to access Buildkite" when I click on the details link, so I assume those are not something I can address. If I'm wrong there, please let me know what I need to do.

Nov 17 '22 02:11 alnet

Do I need to be doing something else to move this PR along?

Nov 22 '22 01:11 alnet

I'll dive into fixing errors in the morning, and will rebase on the latest dev. For the commenting asking about "omitempty", did you mean all the struct tags, or just message/errors?

Nov 22 '22 03:11 alnet

I'm actually not 100% positive and might need to bounce this idea off of @r-vasquez -- I think APIs usually optionally include an error, but always include other fields. I'd like to go with what's most common. If GCP / AWS APIs always include all fields, then we should too. If APIs commonly leave out empty fields, then we can leave out empty fields.

Nov 22 '22 04:11 twmb

Latest push contains most of the changes suggested and a rebase from dev. I did not add the omitempty to all the json/yaml tags everywhere as it seems like that's not fully decided yet. If the team wants them added to all/specific tags, l'll update accordingly once that's decided.

Nov 22 '22 18:11 alnet

I thought I had vscode setup to auto format on save... but apparently it's not working. When manually running format on the listed files, I saw the expected changes. I apologize for the unformatted commit and will look into vscode's config again.

Edit: the vscode-vim plugin for vs code doesn't fire save hooks on :w apparently. If anyone happens to be using that setup, be aware.

Nov 22 '22 21:11 alnet

Rebased from latest dev branch.

Nov 29 '22 13:11 alnet

Rebased from latest dev again.

Dec 03 '22 05:12 alnet

Rebased again to keep things close to upstream.

Dec 06 '22 20:12 alnet

Hello @alnet, don't worry about keeping the branch rebased (until you have a merge conflict), our CI rebase and test with the tip of dev so after reviewing + approving the PR I'll rerun everything.

Dec 08 '22 15:12 r-vasquez

Alrighty

Dec 08 '22 16:12 alnet

There are mismatch also in rpk group describe vs JSON format:

$ rpk group describe g1
GROUP        g1
COORDINATOR  1
STATE        Empty
BALANCER     
MEMBERS      0
TOPIC      PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG   MEMBER-ID  CLIENT-ID  HOST
oct-10     0          -               2               2                           
test       0          -               18              18                          
test-seek  0          -               9               9        

$ rpk group describe g1 --format json
{"groups":[{"group":"g1","coordinator":{"node_id":1,"port":9092,"host":"0.0.0.0","rack":null},"state":"Empty","protocol_type":"consumer","protocol":"","members":[],"described_rows":{"rows":[{"topic":"oct-10","partition":0,"current_offset":"-","log_end_offset":2,"lag":"2","member_id":"","instance_id":null,"client_id":"","host":"","err":null},{"topic":"test","partition":0,"current_offset":"-","log_end_offset":18,"lag":"18","member_id":"","instance_id":null,"client_id":"","host":"","err":null},{"topic":"test-seek","partition":0,"current_offset":"-","log_end_offset":9,"lag":"9","member_id":"","instance_id":null,"client_id":"","host":"","err":null}],"use_instance_id":false,"use_err":false},"err":null}]}

Comments here:

Coordinator JSON object includes id, port, host, and rack, vs the table coordinator property which only includes the id.
described_rows are actually topics, described rows don't mean much to our end users. Same with row.
Topics (row) include an instance-id property, our table does not include that.
At the end of the JSON object we add use_instance_id and use_err property, these 2 properties don't mean anything to our end users and should be stripped from the JSON, those are internal.

Dec 09 '22 14:12 r-vasquez

I wasn't sure if emulating the table output as closely as possible, or providing more of the raw values coming back from the internal object was the right way to go. My initial thoughts were to provide it all in some meaningful way and let the consumer decide what they care about. This was the logic behind always including the detailed output when --format != text since the consumer could simply ignore that detail if they didn't care.

Is it okay to interpret this comment as "emulate existing table output?" and so go through my additions to only provide the same data as the text tables (detailed output included?)

Dec 09 '22 15:12 alnet

I'll say "emulate existing command output" instead of "emulate existing table output". That means that we display the same info given the same set of flags in the command, either in text format or JSON format.

Dec 09 '22 16:12 r-vasquez

"emulate existing command output" is what I meant, wording was poor on my part. Do you want me to proceed with creating a new PR for just one command, and then move on with one PR per command to make the review burden easier? So first one would be adding --format to rpk topic list as well as the formatted output of a new topic being added, as was given as an example easier that will have the changes only to the out package and the code that uses it in topic list command, once that's polished up and accepted move on to the next, etc? Or would you like it broken down some other way, etc?

It may be a few days before I have time to work on this again so plenty of time to decide on how you'd prefer it sliced up and I'll hopefully get it done in downtime over the upcoming holidays.

Dec 12 '22 14:12 alnet

We need to be consistent on 2 things:

The information should be the same in all formats.
The flags that modify the output, should modify the output in the other formats as well.

A couple of examples:

$ rpk acl list --format text
PRINCIPAL     HOST  RESOURCE-TYPE  RESOURCE-NAME  RESOURCE-PATTERN-TYPE  OPERATION  PERMISSION  ERROR
User:my-user  *     TOPIC          demo           LITERAL                ALL        ALLOW

$ rpk acl list --format json
{
   "acl":[
      {
         "principal":"User:my-user",
         "host":"*",
         "resource_type":"TOPIC",
         "resource_name":"demo",
         "resource_pattern_type":"LITERAL",
         "operation":"ALL",
         "permission":"ALLOW",
         "error":""
      }
   ]
}

$ rpk topic list --format text
NAME       PARTITIONS  REPLICAS
asdf       3           1
demo       1           1

$ rpk topic list --format json
{
   "topics":[
      {
         "name":"asdf",
         "partitions":3,
         "replicas":1
      },
      {
         "name":"demo",
         "partitions":3,
         "replicas":1
      }
   ]
}

$ rpk topic list --detailed --format text

asdf, 3 partitions, 1 replicas
      PARTITION  LEADER  EPOCH  REPLICAS
      0          1       39     [1]
      1          1       39     [1]
      2          1       39     [1]

demo, 1 partitions, 1 replicas
      PARTITION  LEADER  EPOCH  REPLICAS
      0          1       3      [1]

$ rpk topic list --detailed --format json
{
   "topics":[
      {
         "name":"asdf",
         "partitions":3,
         "replicas":1,
         "partition_detail":[
            {
               "partition":0,
               "leader":1,
               "epoch":39,
               "replicas":[
                  1
               ]
            },
            {
               "partition":1,
               "leader":1,
               "epoch":39,
               "replicas":[
                  1
               ]
            },
            {
               "partition":2,
               "leader":1,
               "epoch":39,
               "replicas":[
                  1
               ]
            }
         ]
      },
      {
         "name":"demo",
         "partitions":3,
         "replicas":1,
         "partition_detail":[
            {
               "partition":0,
               "leader":1,
               "epoch":3,
               "replicas":[
                  1
               ]
            }
         ]
      }
   ]
}

Dec 13 '22 18:12 r-vasquez

As the colleague (at the time) of @alnet who suggested and encouraged this solution to the problem of automating around the tooling - to work some structured data into the tool rather than parsing the text output. and to cooperate with you maintainers on getting a PR upstream - I find the progress all of you have made delightful. I'm sure this will help with the problems we were facing, and I feel confident this could help others including perhaps my new team.

I don't have much else to say at this point besides trying to send everyone in the collaborative back-and-forth on this PR some encouraging and thankful words.

Dec 16 '22 05:12 cestith

It turns out my new team at my new employer would find this handy. Is there any movement on this PR?

May 17 '23 14:05 cestith

Sadly no, our use case hasn't grown as much as I thought it would at work, and my free time at home has been cut into heavily in the past couple months.

May 17 '23 15:05 alnet

We might have time to take ownership of this this quarter -- the biggest thing I think is for us to comprehensively revisit what JSON shapes we want to expose and then wire that through.

May 17 '23 15:05 twmb

Closing this, we are beginning to add --format as well

Dec 07 '23 04:12 twmb