`wait_for_schema_agreement` fails when connection is closed
wait_for_schema_agreement picks connection from control connection without handling control connection reconnection so it is possible to pick closed connection.
https://github.com/scylladb/python-driver/blob/75970592758c07cd28e2bcadb77f2d10726a8285/cassandra/cluster.py#L4073-L4074
It can be a problem if wait_for_schema_agreement is used as a public API e.g. in scylladb's tests:
https://github.com/scylladb/scylla-dtest/blob/d4b534021036503fdcc3400ab5da9f3407c81cf3/dtest_class.py#L220
There is an optional connection argument in wait_for_schema_agreement, it is currently used only in case when we received RESULT_KIND_SCHEMA_CHANGE to ensure wait_for_schema_agreement is ran on the same host where the statement was executed.
Maybe instead of connection we can hand over a connectionpool or session to ensure correctness even if one particular connection is closed for some reason.
Refs: https://github.com/scylladb/scylladb/issues/25966
So, basically there are two cases:
- When it is run on control connection, in which case we can wait till control connection is reconnected, the problem there is if new connection lands on another node it can fake-pass.
- When it is called from regular connection that belongs to a session, in which case if we provide connection pool with the connection, if connection is gone in this case we can pull it from the pool.
I think it makes sense to split code into two, keep wait_for_schema_agreement in ControlConnection, and have similar code on Session and start calling Session.wait_for_schema_agreement in second case.
This code would try to pull connection from the same pool, if pool is not available it would go over all the hosts and check if they agreed on the schema.
And have another API on Session that would pull schema from all the hosts, in cases when user wants to check if whole cluster agreed on the schema, what is needed to fix https://github.com/scylladb/scylladb/issues/21371 and https://github.com/scylladb/scylladb/issues/25966
Regarding first case, it is only called from _refresh_schema where it is needed to figure out if it is a time to pull new schema or not.
fake-pass will result in re-reading schema when schema is not agreed on and can result in a having outdated schema.
failing it will also result in having old schema, since schema will not be updated.
and throwing error will result in having control connection reconnection, rereading schema that also could be outdated.
The best solution would be to read schema from all the nodes in such case, but ControlConnection belongs to a cluster, not a session, which means there is no good way of doing it.
So, to address it I would propose to make ControlConnection.wait_for_schema_agreement return None in the case when new connection landed on another host and log a warning in ControlConnection._refresh_schema when it happens, but still read the schema.
I'm not sure I fully follow. There are 2 ways one can check schema agreement: A. Query system.local on all nodes B. Query system.local and system.peers on one node
The way I understand it, there are at least two cases when we want to perform schema agreement wait.
- User performed DDL, RESULT_KIND_SCHEMA_CHANGE was returned. In that case we have to make sure that the coordinator for the request is included in the agreement. If we don't there is a possibility that we see an agreement, but its schema does not yet include newly executed DDL. So for method
A, we need to make sure that coordinator is one of nodes that responded, and withB(that Python Driver uses afaik) we have to query using a connection to coordinator. The solution proposed by @sylwiaszunejko seems reasonable: we can use any connection to such node. If there are no available connections, then schema await should fail, we can't fallback to querying other nodes. - User requests such wait, using public API of the driver. In that case, there are various reasonable semantics. "There is schema agreement from the perspective of CC" and "all nodes driver is connected to are in agreement" both sound reasonable, and are implemented by mechanisms
BandArespectively. Currently the driver uses the former semantics. I don't think switching CC mid-wait breaks this, so imo we can do this.
From what @dkropachev wrote there is another scenario where agreement wait is performed, and it has something to do with schema fetching on CC if I understand correctly. @dkropachev could you elaborate more on this scenario (and the fake-passing you described)? I don't fully understand when it is used.
The schema parser also uses connection and not more than that. I assume it can fail midway as well.