Optimize CachinSession
CachingSession is currently suboptimal, especially when using it &'static str (and I suppose it's the main use case), mostly for two reasons:
-
Into<Query>implies a string cloning in case of&str; however, making the query is not necessary to perform cache lookup, only the string reference is necessary - cache stores raw
PreparedStatements, which have to be cloned when retrieved (andPreparedStatementis quite heavy to clone).
I've implemented an optimization to solve this issue, in which I've made the following modifications:
- introduce
trait QueryString {fn query_string(&self) -> &str;}and usequery: impl Into<Query> + QueryStringinCachingSession::Execute(and others) signature; cache lookups are performed usingquery.query_string(), andquery.into()is only used when the statement must be prepared - store
Arc<PreparedStatement>into the cache
I've done some benchmarks (using my old i7 4650U MacBook Air), in which I've measured only one prepared statement retrieval (no serialization and database access) from &'static str. Here are the results :
| query string size | current | optimized |
|---|---|---|
| 20 | 390 ns | 205 ns |
| 100 | 450 ns | 250 ns |
I let you judge if the optimization could be worth it. The implementation is just a draft and maybe adapted (I can open a PR).
Kind of duplicate of #475, but implementations are quite different
Hey, I was about to open an issue around the fact that &'static str is the default case in most instances, or at least it should be (since if you're building queries with format strings you're potentially going to have injections), and therefor Query should accept a &str. This is particularly nice if you're calling the same query many times but with different parameters.
I also agree that the cache should hold Arc<PreparedStatement> to make it cheaper to clone.
I have a couple of notes on your code: https://github.com/wyfo/scylla-rust-driver/blob/optimize_caching_session/scylla/src/transport/session.rs#L595
prepare is a bit unfortunate in that it takes impl Into<Query> but never actually needs ownership of Query. That's because Connection::send_request borrows the request. So at no point is an owned buffer actually necessary at all.
The main benefit of taking impl Into<Query> is that it lets you provide static strings, but it means that every called to impl Into<Query> is forcing an allocation. That's still happening with the approach you've taken.
This is why I went down the route of making impl Into<Query> cheap. Unfortunately, I've lost that code. Whoops.
I'd like to hear from the Scylla devs on this, if there's any interest or not. I think there are a few things to do here:
-
Moving PreparedStatement into Arc should likely be its own PR, it should be a very straightforward win. There is a clone of PreparedStatement for every
add_prepared_statementcall, whether it was previously prepared or not.PreparedStatementis a very large structure that itself has many pointers, includingArc<dyn _>pointers, so this should just strictly be a win. -
Determining what amount of API breakage, if any, is acceptable. In the approach I took I had a
Cow<'a, String>in order to minimize API breakage. I think optimizing for the&'staticor&'acases makes the most sense for two reasons.
- Most queries should be static strings, or otherwise references
- At no point does Scylla's driver require owning the bytes. This makes sense when you consider that the query is going to be compressed into a separate buffer, meaning it never needs to manage the lifetime internally
The changes to to avoid those clones and copies would be:
- Changing
QuerytoQuery<'a> - Modifying
Session::prepareto not require ownership of the Query. This will help in the case where aStringis used as the buffer.
I took a quick pass at redoing the work. https://github.com/scylladb/scylla-rust-driver/compare/main...colin-grapl:scylla-rust-driver:remove-allocations
This is a breaking API change. In particular, if you were calling .into() on queries before passing them into execute, you have to not do that. I have also added lifetimes to a number of structs.