[INLONG-5101][DataProxy] Optimize load balancing for DataProxy
fix #5101
Motivation
Based on my reading of the source code. Currently, the DataProxy SDK side selects DataProxy nodes using polling (sending messages in TCP mode) and random selection (sending messages in HTTP mode).The polling method is not efficient enough, and the random method is not easy to achieve load balancing.
Changes
Use consistent hashing algorithm instead of the original polling and random
Mechanism Options
Consistent Hash Algorithm and Virtual Node Mechanism Refer to the article for details on the algorithm
Design
Based on my reading of the source code.The following are the functions that need to be modified:
org.apache.inlong.sdk.dataproxy.network.ClientMgr.getClientByRoundRobin():This function obtains the DataProxy node by polling org.apache.inlong.sdk.dataproxy.http.InternalHttpSender.sendMessageWithHostInfo(List bodies, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit):This function implements the selection of DataProxy nodes by randomly selecting HostInfo Need to update the fields of the DataProxy node class to add information about virtual nodes The hash ring and virtual nodes need to be completed on the DataProxy side, and the strategy for acquiring DataProxy nodes on the SDK side must be updated at the same time.