inlong icon indicating copy to clipboard operation
inlong copied to clipboard

[INLONG-5101][DataProxy] Optimize load balancing for DataProxy

Open rhizoma-atractylodis opened this issue 3 years ago • 0 comments

fix #5101

Motivation

Based on my reading of the source code. Currently, the DataProxy SDK side selects DataProxy nodes using polling (sending messages in TCP mode) and random selection (sending messages in HTTP mode).The polling method is not efficient enough, and the random method is not easy to achieve load balancing.

Changes

Use consistent hashing algorithm instead of the original polling and random

Mechanism Options

Consistent Hash Algorithm and Virtual Node Mechanism Refer to the article for details on the algorithm

Design

Based on my reading of the source code.The following are the functions that need to be modified:

org.apache.inlong.sdk.dataproxy.network.ClientMgr.getClientByRoundRobin():This function obtains the DataProxy node by polling org.apache.inlong.sdk.dataproxy.http.InternalHttpSender.sendMessageWithHostInfo(List bodies, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit):This function implements the selection of DataProxy nodes by randomly selecting HostInfo Need to update the fields of the DataProxy node class to add information about virtual nodes The hash ring and virtual nodes need to be completed on the DataProxy side, and the strategy for acquiring DataProxy nodes on the SDK side must be updated at the same time.

rhizoma-atractylodis avatar Aug 15 '22 06:08 rhizoma-atractylodis