chproxy icon indicating copy to clipboard operation
chproxy copied to clipboard

[BUG] Always select node from a small subset of all nodes for any random session_id

Open zghong opened this issue 2 weeks ago • 0 comments

Describe the bug

For any random session_id, chproxy always selects node from a small subset of all nodes.

To Reproduce

Run chproxy on 127.0.0.1:8090 with the following config.xml:

clusters:
  - name: test_cluster
    replicas:
      - name: "replica1"
        nodes:
          - 127.0.1.1:8123
          - 127.0.1.2:8123
      - name: "replica2"
        nodes:
          - 127.0.2.1:8123
          - 127.0.2.2:8123
#! /bin/bash

for session_id in $(seq 0 1000); do
    echo "select hostname();" | curl "http://default:[email protected]:8090?session_id=$session_id" -d @-
done

the above script will always select node from 127.0.1.1 and 127.0.2.2, and the other nodes will never be selected.

Expected behavior

chproxy should select node from all nodes for any random session_id.

Screenshots

No.

Environment information

  • chproxy version: 1.30.0.

Additional context

  • The getReplicaSticky and getHostSticky functions exhibit performance bottlenecks and logical inconsistencies.
  • By design, sticky sessions should consistently route all requests with the same session_id to the exact same node, regardless of whether the node is active or not. However, the current implementation fails to maintain this consistency when the selected node's active status changes during this period. Resolving this issue presents challenges, especially in chproxy topologies with 2 or more replicas, which may require introducing distributed storage solutions like Redis. I have added TODO in the code and will open another issue about this.
    • Example 1: A sticky session initially routes requests to node 127.0.1.1 based on its session_id. If 127.0.1.1 later becomes inactive, subsequent requests with the same session_id are incorrectly rerouted to another active node (e.g., 127.0.2.2) instead of remaining directed to 127.0.1.1.
    • Example 2: A sticky session should route to node 127.0.1.1 but initially selects 127.0.2.2 because 127.0.1.1 is inactive. When 127.0.1.1 later becomes active, subsequent requests with the same session_id are incorrectly switched to 127.0.1.1, breaking session stickiness.

zghong avatar Dec 07 '25 13:12 zghong