python-driver icon indicating copy to clipboard operation
python-driver copied to clipboard

Session not reconnected after rolling upgrade

Open Jadw1 opened this issue 1 year ago • 3 comments

Observed in https://github.com/scylladb/scylla-enterprise/pull/4634#issuecomment-2333883650.

The test is running with force_gossip_topology_changes: true, so auth is not managed via raft and auth data is stored in system_auth keyspace with default replication factor 1. Test fails once per several runs. It is doing rolling upgrade but sometimes the driver is not connected to some of the nodes after the rolling upgrade is finished (all nodes are up).

Reproducer:

@pytest.mark.asyncio
async def test_rolling_restart_with_auth(manager: ManagerClient):
    config = {
        'force_gossip_topology_changes': True,
    }
    servers = [await manager.server_add(config=config) for _ in range(3)]
    cql = manager.get_cql()
    hosts = await wait_for_cql_and_get_hosts(cql, servers, time.time() + 60)

    await manager.rolling_restart(servers)

I was running the reproducer in test/auth_cluster suite (enabled authentication) https://github.com/scylladb/scylladb/blob/master/test/auth_cluster/suite.yaml

During the upgrade, the driver cannot authenticate if replica which owns the part of token ring holding user data (system_auth has RF=1) is down. But it isn't reconnected after the node gets up.

pytest.log

Jadw1 avatar Sep 10 '24 11:09 Jadw1

Cc @piodul @Lorak-mmk

Jadw1 avatar Sep 10 '24 11:09 Jadw1

@scylladb/drivers-team

dkropachev avatar Sep 10 '24 13:09 dkropachev

Should it reconnect? How should it happen.

In any case, I don't think system_auth with Rf=1 is a relevant case.

roydahan avatar Feb 20 '25 19:02 roydahan