OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[Feature Request] Optimize the api _cat/nodes

Open kkewwei opened this issue 1 year ago • 7 comments

Is your feature request related to a problem? Please describe

Now the method is as follows:

        return channel -> client.admin().cluster().state(clusterStateRequest, new RestActionListener<>(channel) {
                     ......
                    nodesInfoRequest.timeout(request.param("timeout"));
                    client.admin().cluster().nodesInfo(nodesInfoRequest, new RestActionListener<NodesInfoResponse>(channel) {
                               ......
                               // wait all the nodes response
                               nodesStatsRequest.timeout(request.param("timeout"));
                              client.admin().cluster().nodesStats(nodesStatsRequest, new RestResponseListener<NodesStatsResponse>(channel) {
                                      ......
                               }
                    }
        }

It seems has two problems:

  1. cluster().nodesInfo() and cluster().nodesStats() use separate timeout, in that case, if timeout from the client is 30s, without adding cluster().state(), the overall time can be 60s, which is 2x times that the expect.
  2. Only all nodes return the a NodeInfoResponse in cluster().nodesInfo() can the next cluster().nodesStats() be called. It's normal to have a slow node(such as fullGc) in large clusters, the api will become unresponsive, it means that if some of nodes are blocked in cluster().nodesInfo(), the overrall api will be blocked.

Describe the solution you'd like

  1. If timeout is 30s in _cat/nodes, the overall time should be around 30s.
  2. If some of nodes are blocked, it doesn't affect the rest nodes to get metrics. Each node separately call cluster().nodesInfo() and cluster().nodesStats().

The code can be like this:

        long time1 = threadPool.relativeTimeInMillis();
        return channel -> client.admin().cluster().state(clusterStateRequest, new RestActionListener<>(channel) {
                     ......
                    long time2 = threadPool.relativeTimeInMillis();
                    nodesInfoRequest.timeout(timeout - (time2-time1)));
                    for (String nodeId : nodeIds) {
                         nodesInfoRequest.nodesIds(nodeId);
                          client.admin().cluster().nodesInfo(nodesInfoRequest, new RestActionListener<NodesInfoResponse>(channel) {
                                    ......
                                    long time3 = threadPool.relativeTimeInMillis();
                                    nodesStatsRequest.timeout(timeout - (time3-time1)));
                                    nodesStatsRequest.nodesIds(nodeId);
                                   client.admin().cluster().nodesStats(nodesStatsRequest, new RestResponseListener<NodesStatsResponse>(channel) {
                                         ......
                                    }
                           }
                    }
                    channel.sendResponse(RestTable.buildResponse(buildTable(fullId, request, clusterStateResponse, nodesInfoResponse, nodesStatsResponse), channel));

        }

Related component

Cluster Manager

Describe alternatives you've considered

No response

Additional context

No response

kkewwei avatar Jul 14 '24 03:07 kkewwei