java-driver
java-driver copied to clipboard
Don't return empty routing key when partition key is unbound
DefaultBoundStatement#getRoutingKey has logic to infer the routing key when no one has explicitly called setRoutingKey or otherwise set the routing key on the statement. It however doesn't check for cases where nothing has been bound yet on the statement. This causes more problems if the user decides to get a BoundStatementBuilder from the PreparedStatement, set some fields on it, and then copy it by constructing new BoundStatementBuilder objects with the BoundStatement as a parameter, since the empty ByteBuffer gets copied to all bound statements, resulting in all requests being targeted to the same Cassandra node in a token-aware load balancing policy.
More related arguments we could make since we can re-bind the partition key:
- we should never copy over
getRoutingKey
while copying thetemplate
at StatementBuilder:71 - OR we should copy over the raw instance variable, not the computed value
- OR we should only generate the routing key when evaluating the statement in the load balancing policy, if
Request#getRoutingKey
returns null at BasicLoadBalancingPolicy:290
Between all of those, the simplest solution is probably 1. The second would be nice too but there's no easy way to get the instance variable (save casting to a DefaultBoundStatement and perhaps exposing another getter), and I can't think of a non-brittle way to do the third one.
Worth nothing that as far as I can tell, DefaultBoundStatement#getRoutingKeyspace
doesn't have this problem because the field that's used to calculate its value (the prepared statement) can't change.
In any case, probably a different PR for that follow-up change?
Thanks for this @akhaku , and my apologies for taking so long to get back to you... holidays and all that! This change looks pretty good; I had a few comments on your current implementation but I don't think any of them are major. I haven't yet given a great deal of thought to your points about handling the cases around the builder... I want to think about that a bit more.
No worries, thanks for looking at this! The other comments are much less urgent, I think this is a somewhat serious bug (if you happen to hit it) and so good to get this fixed before worrying about my other comments about cleanup.
Gentle nudge @absurdfarce :)
Hey @akhaku, apologies for the delay in getting back to you!
I'm inclined to agree with your analysis above; I think we're probably okay to merge this change as it stands and address the questions copying values in another PR. With that in mind I think this PR is ready to go as it stands so I'll try to get it merged later today and create a follow-up ticket for the other questions.
Thanks for all your work on this!
Hey @akhaku, regarding the follow-up ticket... I think I need to understand the case in question a bit better.
You mentioned above that we can "re-bind the partition key" for the DefaultPreparedStatement referenced by a given DefaultBoundStatement. For my own understanding... is the concern here that the List returned by getPartitionKeyIndices() is mutable? Or are you referring to another operation that I'm just not thinking of at the moment?
I'm also a bit confused by your (correct) point that DefaultBoundStatement.getRoutingKeyspace() uses a field (preparedStatement) in DefaultBoundStatement which cannot change. What confuses me is that getRoutingKey() references the same field if a fixed routing key wasn't supplied when the instance was constructed. So the only difference there is a difference in return value (ColumnDefinitions via getVariableDefinitions() in the routing keyspace case, a List of Integers via getPartitionKeyIndices in the routing key case).... which returns me to my earlier question about mutability.
All of that said, the second option on your list ("we should copy over the raw instance variable, not the computed value") seems most desirable (at least given my current understanding). If the user has explicitly specified a routing key preserve it, otherwise allow it to be computed as it normally would.
Just a rebase. It's been a while since I looked at this, so based on my fuzzy recollection of what I was talking about:
Regarding rebinding the partition key: you can use SettableByName
or SettableById
to reset the partition key after it's already set. If we had the old routing key cached, then after changing the bound parameter, the cached routing key is no longer accurate, we need to recompute the routing key based on the new bound partition key.
That also sort of answers your question about DefaultBoundStatement#getRoutingKeyspace
- while we can change the underlying information that's used to calculate the routing key (the bound parameter for the partition key), we cannot change the parameter that is used to figure out the routing keyspace.
This has been sitting for far too long and the core change here is a good one. We can and should continue any discussion around the handling of getRoutingKey() in DefaultBoundStatement but I want to make sure the core change of this PR gets into 4.19.0.