sqlx
sqlx copied to clipboard
ERROR: cached plan must not change result type
Is there a way to invalidate the query plan cache when this error occurs? Some other drivers have an options to deal with this scenario.
This is cause be migrating the database during a running sqlx connection pool and changing the schema in such a way that the result is different than the query plan cache. This is problematic in a production environment if you have to restart the application servers after migration.
Hey I ran into a same issue and had success with this https://www.postgresql.org/docs/9.4/sql-discard.html
You can use https://docs.rs/sqlx/0.5.5/sqlx/trait.Connection.html#method.clear_cached_statements to reset ALL cached statements. This should definitely be ran after any DDL if you have any open connections.
However.. we don't have an easy way to run this automatically or even for all connections in the pool. Some ideas:
- A method on
poolto run a function on every connection once:
pool.for_each_connection(|conn| conn.clear_cached_statements());
- A method on
poolto discard all open connections and reconnect:pool.recycle().
Thanks @mehcode I realised that not all connections is clear. This leads to my question: how does this behave if I have load balancing servers connect to the same database? i.e. how I can notify all the servers of the changes?
In my particular case, I have a system to help users to alter different tables from time to time. Do you know a way to to discard cached plan for a table/schema? if not is there a way to disable cache plan for sqlx or any better way to let users adjust schema dynamically?
Thanks in advance.
Hey @mehcode not sure if it has any value but Rails has the same issue and what they do is to clear the cache and retry it
https://github.com/rails/rails/blob/main/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb#L725-L737
For transaction, they need to handle it a bit more carefully, they have to rollback the transaction first before clearing the cache
https://github.com/rails/rails/blob/8db23109bf61052eef437629b6ef27a94e0b5bd9/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb#L323-L327
+1000 to this, it's a giant footgun: you run a migration and your sqlx app goes completely down until you restart it. I haven't seen any other postgres lib in any language failing like this.
It would be great if sqlx could detect stale cached statements and automatically recover.
A workaround is setting .statement_cache_capacity(0) in PgConnectOptions, but that has a bad performance hit.
We can work on ways to address this. It could be something on ConnectOptions to tell the connection to automatically flush the prepared statements if it encounters this error instead of emitting it. It's really just a sanity check.
However, at the same time I'm wondering if this is a symptom of a deployment procedure issue. If you push a migration that changes the output of a query, isn't that likely to break your application anyway? This is why we prefer to embed migrations in the application binary, so that schema changes always coincide with an app upgrade. We never run into this error in our deployments.
If you push a migration that changes the output of a query, isn't that likely to break your application anyway?
This is a system where the main app who manages migrations is written in Go, and there's a Rust service reading only a handful of fields from a particular table. We ensure the main Go application never breaks these fields for the Rust service. Restarting the Rust service for every Go deployment is not an option, it's an IoT broker handling long-lived connections, its whole purpose is to keep these connections alive across application deployments.
I can imagine cases where this is an issue even with a single Rust service: if you want to do zero-downtime deployments you keep the old instances around while the new instances are starting up and doing the migrations (making sure migrations are "backwards compatible" of course). This bug will break the old instances, preventing the deployments to be zero-downtime.
We're running into this issue and are doing exactly what the previous comment is suggesting: we do zero-downtime deployments by keeping old instances around while new instances are starting up. We in fact pulled the migrations out of our main service instances and run them from a separate binary to gain more control over when they happen.
It would be great if sqlx could address this internally. I'm also still wondering about the scope of when this happens. For example, we're currently seeing this on adding a new column that would be currently unused by any query. Specifically, I wonder if we could work around this issue by avoiding SELECT * (and RETURNING *) clauses.
If someone wants to give me some guidance on where this change could fit in and what the best way is to specifically distinguish this error, I would probably be able to contribute changes in this direction.
Specifically, I wonder if we could work around this issue by avoiding SELECT * (and RETURNING *) clauses.
That's correct. You should only be seeing that error if you've made changes that affect the query plan of existing queries.
Looking back at this issue, there isn't any reason why we shouldn't capture that error and re-prepare the query. Similar to what's described here: https://github.com/launchbadge/sqlx/issues/1250#issuecomment-878249285
Hey, I would also be really interested in a solution here. We change our tables dynamically quiet often (It's a low-code platform) and with that, we now getting this error more and more...
This issue seems to have been open for a while and is increasingly affecting a number of users, including @Roba1993 and myself. Given the significant impact it has on production environments, especially for those of us with frequent change on database schemas, it would be beneficial to prioritize a solution. Right now we have to set .statement_cache_capacity(0), which is ... suboptimal.
@mehcode Is there a chance to get this error/request into your backlog within launchbadge?