etcd Add better verbosity to the etcdctl endpoint status output that includes percentage of DB used (to illustrate Defrag rate)

What would you like to be added?

I have opened a PR with the specific changes: https://github.com/etcd-io/etcd/pull/17871

Why is this needed?

Adding the output from OMC which grants visibility of the size of the DB, how much is in use, and then providing the difference as a percentage for quick visual on storage impact on the cluster. A minor change but one that makes it much easier to illustrate where storage is going.

Current release:

$ etcdctl endpoint status -w table

sh-4.4# etcdctl endpoint status -w table
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://xxx.xxx.xxx:2379 | 1fc63d8d0eb5fa7b |  3.5.9 |  6.7 GB |      false |      false |         5854 |    4073918328 |            4073918328 |        |
| https://xxx.xxx.xxx:2379 | 3bd2d35c028a6a2f |  3.5.9 |  6.8 GB |     true |      false |         5854 |    4073918329 |            4073918329 |        |
| https://xxx.xxx.xxx:2379 | 41cbbc084d60007d |  3.5.9 |  6.8 GB |     false |      false |         5854 |    4073918329 |            4073918329 |        |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

proposed change: $ etcdctl endpoint status -w table

+-----------------------------+------------------+---------+----------------+----------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE/IN USE | NOT USED | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+----------------+----------+-----------+------------+-----------+------------+--------------------+--------+
| https://xxx.xxx.xxx:2379 | 1fc63d8d0eb5fa7b | 3.5.9   | 6.7 GB/5.6 GB  | 18%      | false     | false      |      5853 | 4073918326 |         4073918326 |        |
| https://xxx.xxx.xxx:2379 | 3bd2d35c028a6a2f | 3.5.9   | 6.8 GB/5.6 GB  | 19%      | true      | false      |      5853 | 4073918327 |         4073918327 |        |
| https://xxx.xxx.xxx:2379 | 41cbbc084d60007d | 3.5.9   | 6.8 GB/5.6 GB  | 18%      | false     | false      |      5853 | 4073918330 |         4073918330 |        |
+-----------------------------+------------------+---------+----------------+----------+-----------+------------+-----------+------------+--------------------+--------+

This output is available in OMC (used for analysis of existing etcd databases running on openshift presently) and visual representation is not available to customers as a default, when it could be.

Apr 24 '24 16:04 Scotchman0

@Scotchman0, it may be helpful if you can add the current output to compare to your suggestion.

/assign @Scotchman0

Apr 24 '24 20:04 ivanvc

I like the idea, but after seeing the implementation, I wonder if it would be better to add more columns rather than reusing/multi-purposing the current column. I defer to @ahrtr / @serathius with their point of view.

Apr 24 '24 20:04 ivanvc

updated the initial with the comparative of what we've got now versus proposed change - I'm open to separating by column; fairly easy to make those a separate block instead of joining.

Apr 24 '24 20:04 Scotchman0

alternative implementation with separate columns for comparison might look like the following if it's desirable to segment instead of conjoin an output block:

+-----------------------------+------------------+---------+----------------+----------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IN USE | NOT USED | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+----------------+----------+-----------+------------+-----------+------------+--------------------+--------+
| https://xxx.xxx.xxx:2379    | 1fc63d8d0eb5fa7b | 3.5.9   | 6.7 GB | 5.6 GB  | 18%      | false     | false      |      5853 | 4073918326 |         4073918326 |        |
| https://xxx.xxx.xxx:2379    | 3bd2d35c028a6a2f | 3.5.9   | 6.8 GB | 5.6 GB  | 19%      | true      | false      |      5853 | 4073918327 |         4073918327 |        |
| https://xxx.xxx.xxx:2379    | 41cbbc084d60007d | 3.5.9   | 6.8 GB | 5.6 GB  | 18%      | false     | false      |      5853 | 4073918330 |         4073918330 |        |
+-----------------------------+------------------+---------+----------------+----------+-----------+------------+-----------+------------+--------------------+--------+

Apr 24 '24 21:04 Scotchman0

Would it be easier to put the storage information as usage 5.6/6.7 GB (18%) ?

I would be also interested if we could quota somewhere, not sure it's accessible via API.

Apr 25 '24 08:04 serathius

I agree it's a minor enhancement, but on other hand, it's minor breaking on the user experience. I like the idea of adding a separate new column.

I wonder if it would be better to add more columns rather than reusing/multi-purposing the current column

Makes sense to me.

I would be also interested if we could quota somewhere, not sure it's accessible via API.

Not sure I got the point. Do you mean that etcdserver also returns storage quota (e.g. 2GiB) to client when users run etcdctl endpoint status? If yes, I definitely support it. Once it's supported, then we can remove --etcd-storage-quota-bytes from the etcd-defrag tool :)

Apr 25 '24 09:04 ahrtr

I got a few minutes today to just pull the config through: https://github.com/etcd-io/etcd/pull/17877

will add more tests in a bit, you guys can have a look in the meantime :+1:

Apr 25 '24 14:04 tjungblu

I went ahead and modified the PR #17871 to include separated columns for each output + a new block for Quota pending #17877 approval/merge to reference the new quota block: might look like the below:

+-----------------------------+------------------+---------+----------------+----------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IN USE | NOT USED | QUOTA | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+----------------+----------+-----------+------------+-----------+------------+--------------------+--------+
| https://xxx.xxx.xxx:2379    | 1fc63d8d0eb5fa7b | 3.5.9   | 6.7 GB | 5.6 GB  | 18%      | ??    | false     | false      |      5853 | 4073918326 |         4073918326 |        |
| https://xxx.xxx.xxx:2379    | 3bd2d35c028a6a2f | 3.5.9   | 6.8 GB | 5.6 GB  | 19%      | ??    | true      | false      |      5853 | 4073918327 |         4073918327 |        |
| https://xxx.xxx.xxx:2379    | 41cbbc084d60007d | 3.5.9   | 6.8 GB | 5.6 GB  | 18%      | ??    | false     | false      |      5853 | 4073918330 |         4073918330 |        |
+-----------------------------+------------------+---------+----------------+----------+-----------+------------+-----------+------------+--------------------+---------

Apr 25 '24 22:04 Scotchman0

I agree it's a minor enhancement, but on other hand, it's minor breaking on the user experience.

Human readable table is not an API. We don't guarantee column names or contents. It should be different to other formats like json, but table is not parsable and it should not be so.

Apr 26 '24 06:04 serathius

Human readable table is not an API.

API != user experience. API is part of user experience. Since it doesn't change the json format which is usually parsed by program, so it's the reason why I call it a minor breaking.

Apr 26 '24 09:04 ahrtr

Just an update to indicate the PR is cleaned/approved for final eyes/checks/thoughts here.

./etcd --quota-backend-bytes '10737418240'

 sunbro@apollo  ~/Downloads/etcd/bin   patch-1  ./etcdctl endpoint status -w table
+----------------+------------------+---------------+-----------------+---------+--------+-----------------------+-------+-----------+------------+-----------+------------+--------------------+--------+
|    ENDPOINT    |        ID        |    VERSION    | STORAGE VERSION | DB SIZE | IN USE | PERCENTAGE NOT IN USE | QUOTA | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------------+-----------------+---------+--------+-----------------------+-------+-----------+------------+-----------+------------+--------------------+--------+
| 127.0.0.1:2379 | 8e9e05c52164694d | 3.6.0-alpha.0 |           3.6.0 |   20 kB |  16 kB |                   20% | 11 GB |      true |      false |         3 |          8 |                  8 |        |
+----------------+------------------+---------------+-----------------+---------+--------+-----------------------+-------+-----------+------------+-----------+------------+--------------------+--------+

./etcdctl endpoint status -w json 
[{"Endpoint":"127.0.0.1:2379","Status":{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":1,"raft_term":3},"version":"3.6.0-alpha.0","dbSize":20480,"leader":10276657743932975437,"raftIndex":8,"raftTerm":3,"raftAppliedIndex":8,"dbSizeInUse":16384,"storageVersion":"3.6.0","dbSizeQuota":10737418240}}]

May 21 '24 14:05 Scotchman0

Done in https://github.com/etcd-io/etcd/pull/17871

May 21 '24 15:05 ahrtr

etcd etcd copied to clipboard

Add better verbosity to the etcdctl endpoint status output that includes percentage of DB used (to illustrate Defrag rate)

What would you like to be added?

Why is this needed?

etcd
etcd copied to clipboard