inlong [Feature] Optimize load balancing for DataProxy

Description

When the SDK produces data, it will select a group of nodes to send data, and when some of them have problems, they will be eliminated, and new candidate nodes will be selected; It is expected to reduce the impact on production so that the client is unaware; b, after the server node is restored, it can be selected again; c, ensure the node load balance.

Use case

No response

Are you willing to submit PR?

[ ] Yes, I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Jul 18 '22 09:07 dockerzhang

@dockerzhang This is a good idea, I would like to do this.

Jul 22 '22 03:07 rhizoma-atractylodis

Motivation

Based on my reading of the source code. Currently, the DataProxy SDK side selects DataProxy nodes using polling (sending messages in TCP mode) and random selection (sending messages in HTTP mode).The polling method is not efficient enough, and the random method is not easy to achieve load balancing.

Changes

Use consistent hashing algorithm instead of the original polling and random

Mechanism Options

Consistent Hash Algorithm and Virtual Node Mechanism Refer to the article for details on the algorithm

Design

Based on my reading of the source code.The following are the functions that need to be modified:

org.apache.inlong.sdk.dataproxy.network.ClientMgr.getClientByRoundRobin():This function obtains the DataProxy node by polling
org.apache.inlong.sdk.dataproxy.http.InternalHttpSender.sendMessageWithHostInfo(List<String> bodies, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit):This function implements the selection of DataProxy nodes by randomly selecting HostInfo
Need to update the fields of the DataProxy node class to add information about virtual nodes
The hash ring and virtual nodes need to be completed on the DataProxy side, and the strategy for acquiring DataProxy nodes on the SDK side must be updated at the same time.

Jul 28 '22 12:07 rhizoma-atractylodis

In the point 4, it is said that "The hash ring and virtual nodes need to be completed on the DataProxy side", here, does "DataProxy node" means those nodes already having established connections or all DataProxy nodes in a cluster? What's more, for SDK, it obtains DataProxy nodes information by interacting with Manager. Could you specifically describe the updating strategy of point 4?

Aug 01 '22 08:08 pocozh

The nodes joining the hash ring should be the nodes that establish the connection. After all, it is impossible to send data to the nodes that have not established the connection.
When I looked at the source code before, I didn't notice that the SDK and Dataproxy interacted through the Manager, thinking they interacted directly.Then the hash ring should be done at the end that maintains the node information.Then there are two cases:（1）Manager maintains node information.The update strategy is to notify the Manager when the DataProxy node changes, and the Manager will uninstall the node in the hash ring, migrate node data according to the consistent hash algorithm, or request redirection and other operations. During this process, the node application from the SDK will be redirected to the new node, so that the client is not aware.（2）The DataProxy side maintains node data.Similar to (1), it only affects where the consistent hashing algorithm is implemented
updating strategy of point 4：（1）When the DataProxy node changes, first generate virtual nodes with virtual_num number and add them to the table of real nodes and virtual nodes.（2）Add all virtual nodes to the hash ring, because the system only writes and does not read from SDK to DataProxy (if there is no misunderstanding, it should be like this). Therefore, when modifying the hash ring, if the SDK has a node request, it only needs to redirect the request to the new node.

Aug 01 '22 14:08 rhizoma-atractylodis

inlong inlong copied to clipboard

[Feature] Optimize load balancing for DataProxy

Description

Use case

Are you willing to submit PR?

Code of Conduct

Motivation

Changes

Mechanism Options

Design

inlong
inlong copied to clipboard