sqlx ERROR: cached plan must not change result type

Is there a way to invalidate the query plan cache when this error occurs? Some other drivers have an options to deal with this scenario.

This is cause be migrating the database during a running sqlx connection pool and changing the schema in such a way that the result is different than the query plan cache. This is problematic in a production environment if you have to restart the application servers after migration.

May 26 '21 06:05 chrisabruce

Hey I ran into a same issue and had success with this https://www.postgresql.org/docs/9.4/sql-discard.html

Jul 11 '21 12:07 phungleson

You can use https://docs.rs/sqlx/0.5.5/sqlx/trait.Connection.html#method.clear_cached_statements to reset ALL cached statements. This should definitely be ran after any DDL if you have any open connections.

However.. we don't have an easy way to run this automatically or even for all connections in the pool. Some ideas:

A method on pool to run a function on every connection once:

pool.for_each_connection(|conn| conn.clear_cached_statements());

A method on pool to discard all open connections and reconnect: pool.recycle().

Jul 11 '21 19:07 mehcode

Thanks @mehcode I realised that not all connections is clear. This leads to my question: how does this behave if I have load balancing servers connect to the same database? i.e. how I can notify all the servers of the changes?

In my particular case, I have a system to help users to alter different tables from time to time. Do you know a way to to discard cached plan for a table/schema? if not is there a way to disable cache plan for sqlx or any better way to let users adjust schema dynamically?

Thanks in advance.

Jul 11 '21 21:07 phungleson

Hey @mehcode not sure if it has any value but Rails has the same issue and what they do is to clear the cache and retry it

https://github.com/rails/rails/blob/main/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb#L725-L737

For transaction, they need to handle it a bit more carefully, they have to rollback the transaction first before clearing the cache

https://github.com/rails/rails/blob/8db23109bf61052eef437629b6ef27a94e0b5bd9/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb#L323-L327

Jul 12 '21 12:07 phungleson

+1000 to this, it's a giant footgun: you run a migration and your sqlx app goes completely down until you restart it. I haven't seen any other postgres lib in any language failing like this.

It would be great if sqlx could detect stale cached statements and automatically recover.

A workaround is setting .statement_cache_capacity(0) in PgConnectOptions, but that has a bad performance hit.

Nov 17 '21 19:11 Dirbaio

We can work on ways to address this. It could be something on ConnectOptions to tell the connection to automatically flush the prepared statements if it encounters this error instead of emitting it. It's really just a sanity check.

However, at the same time I'm wondering if this is a symptom of a deployment procedure issue. If you push a migration that changes the output of a query, isn't that likely to break your application anyway? This is why we prefer to embed migrations in the application binary, so that schema changes always coincide with an app upgrade. We never run into this error in our deployments.

Nov 17 '21 21:11 abonander

If you push a migration that changes the output of a query, isn't that likely to break your application anyway?

This is a system where the main app who manages migrations is written in Go, and there's a Rust service reading only a handful of fields from a particular table. We ensure the main Go application never breaks these fields for the Rust service. Restarting the Rust service for every Go deployment is not an option, it's an IoT broker handling long-lived connections, its whole purpose is to keep these connections alive across application deployments.

I can imagine cases where this is an issue even with a single Rust service: if you want to do zero-downtime deployments you keep the old instances around while the new instances are starting up and doing the migrations (making sure migrations are "backwards compatible" of course). This bug will break the old instances, preventing the deployments to be zero-downtime.

Nov 17 '21 21:11 Dirbaio

We're running into this issue and are doing exactly what the previous comment is suggesting: we do zero-downtime deployments by keeping old instances around while new instances are starting up. We in fact pulled the migrations out of our main service instances and run them from a separate binary to gain more control over when they happen.

It would be great if sqlx could address this internally. I'm also still wondering about the scope of when this happens. For example, we're currently seeing this on adding a new column that would be currently unused by any query. Specifically, I wonder if we could work around this issue by avoiding SELECT * (and RETURNING *) clauses.

If someone wants to give me some guidance on where this change could fit in and what the best way is to specifically distinguish this error, I would probably be able to contribute changes in this direction.

Mar 28 '22 11:03 djc

Specifically, I wonder if we could work around this issue by avoiding SELECT * (and RETURNING *) clauses.

That's correct. You should only be seeing that error if you've made changes that affect the query plan of existing queries.

Looking back at this issue, there isn't any reason why we shouldn't capture that error and re-prepare the query. Similar to what's described here: https://github.com/launchbadge/sqlx/issues/1250#issuecomment-878249285

Mar 28 '22 19:03 mehcode

Hey, I would also be really interested in a solution here. We change our tables dynamically quiet often (It's a low-code platform) and with that, we now getting this error more and more...

Jun 19 '23 08:06 Roba1993

This issue seems to have been open for a while and is increasingly affecting a number of users, including @Roba1993 and myself. Given the significant impact it has on production environments, especially for those of us with frequent change on database schemas, it would be beneficial to prioritize a solution. Right now we have to set .statement_cache_capacity(0), which is ... suboptimal.

Jul 26 '23 02:07 NOBLES5E

@mehcode Is there a chance to get this error/request into your backlog within launchbadge?

Jan 26 '24 14:01 Roba1993

sqlx sqlx copied to clipboard

ERROR: cached plan must not change result type

sqlx
sqlx copied to clipboard