atlasdb icon indicating copy to clipboard operation
atlasdb copied to clipboard

CassandraVerifier#waitForSchemaVersions blocks on Cassandra 3 upgrade

Open leonz opened this issue 4 years ago • 3 comments

https://github.com/palantir/atlasdb/blob/develop/atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraVerifier.java#L213

The CassandraVerifier waits for schema agreement, which is generally a smart thing to do when creating a keyspace. However, in the case of a Cassandra 3 upgrade, the schemas will be mis-aligned until all of the nodes are fully upgraded. This can potentially take as long as the time to rewrite every sstable.

While that function is waiting, services cannot be started and backups cannot be taken. So the status quo is explicitly not possible to maintain.

Some paths forward:

  • Remove this check entirely. Need to understand dangers here.
  • Disable this check via config. Since config is set at the client level, this could potentially be enabled/disabled dynamically. Only disable for the Cassandra 3 upgrade. What happens if services try to create keyspaces/upgrade schemas at this time?
  • Find another form of schema compatibility check that doesn't flag on Cassandra 3 upgrade. If it doesn't exist, write it into Cassandra?

leonz avatar Mar 30 '20 14:03 leonz

Are they misaligned in a decidable way?

j-baker avatar Mar 31 '20 10:03 j-baker

In theory it should be determinable which nodes are on which version of Cassandra, and align that on schema version mismatches to determine if that is the cause. Example (top three nodes are on C*3):

WARN  [2020-03-31T15:50:08.874099Z] com.palantir.atlasdb.keyvalue.cassandra.CassandraVerifier: Couldn't use host {} to create keyspace. It returned exception "{}" during the attempt. We will retry on other nodes, so this shouldn't be a problem unless all nodes failed. See the debug-level log for the stack trace. (host: xx) (exceptionMessage: java.lang.IllegalStateException: Cassandra cluster cannot come to agreement on schema versions, while checking if schemas diverged on startup.
At schema version c95060a4-47f9-3a58-b230-808818ba043c:
        Node: 1.x.x.36
        Node: 1.x.x.224
        Node: 1.x.x.186
At schema version e1243782-0562-3675-869c-de8ff87e799d:
        Node: 1.x.x.69
        Node: 1.x.x.118
        Node: 1.x.x.160
        Node: 1.x.x.66
        Node: 1.x.x.155
        Node: 1.x.x.234

leonz avatar Mar 31 '20 15:03 leonz

This is actually slightly less of a concern than I originally thought.

Schema version is tied to the binary upgrade, not the sstable upgrade. So the time of impact is the time it takes to upgrade the entire cluster, rather than the time to rewrite the sstables.

leonz avatar Mar 31 '20 17:03 leonz