flask-sqlalchemy
flask-sqlalchemy copied to clipboard
Our implementation of binds can cause table name conflicts
This is a description of the issue that #222 tries to solve. After investigating further, and based on differences between SQLAlchemy 1.3 and 1.4, I don't think I'll be able to merge that PR so I'm writing up more here.
We have the SQLALCHEMY_BINDS config mapping keys to engine URLs. Each model can have an optional __bind_key__ attribute. We override Session.get_bind to look for this key and choose the engine from the config. In this way, you can define models that are present in different databases.
This is slightly different than SQLAlchemy itself. There the Session(binds={}) maps classes to engines. So a specific model could have a specific engine, or a base class could be used to map all its models to the same engine. The configuration is done when setting up the session and engines, not when defining the models and tables.
Our way causes issues when two models with the same name / table name are defined that belong to separate binds. The Model base class has one metadata associated with it, and all names registered with a metadata must be unique. It also makes it possible to write foreign keys between models that will end up using separate engines, which won't work. When using plain SQLAlchemy, you would create separate declarative bases and bind each of them to a different engine, but in Flask-SQLAlchemy there is only one db.Model base class.
#222 addresses this by creating a different metadata in the metaclass when creating a model with a different bind key. I wasn't super comfortable with that, but the release of SQLAlchemy 1.4 made it more clear why. In 1.4, it uses a new registry object, and Base.metadata is essentially an alias to Base.registry.metadata. Looking back at how 1.3 did it, Base._decl_class_registry was the equivalent, and it wasn't being overridden by #222, so you'd still have names overwriting each other in the registry if not the metadata. With 1.4, we'd need to override registry, not metadata directly, and looking at #222's current implementation this seems even more complex and messy. And if we want to support SQLAlchemy <= 1.3 we need to detect and override both the old and new implementations.
The problem is that our __bind_key__ is only available after class creation has started, but metadata/registry was created before that when creating the declarative base. There's a disconnect between when we have the information to know what to create, and when it needs to be created. Maybe it could be addressed with metaclass trickery to substitute a different base class when creating a subclass with a different key, but I haven't investigated very far and I'm not particularly enthusiastic about trying to deal with that complexity.
Alternatively, we could make db.make_declarative_base more of a public API (or a new method) and have it take a bind_key parameter. So when you want to use a separate bind, instead of inheriting db.Model, inherit db.get_base_model("key"). We'd probably want to disallow setting __bind_key__ manually and show a message saying what to do instead, except there are also valid reasons to have models in the same metadata use different binds, as long as there's no conflicts. However, this seems pretty confusing to teach users, I foresee it causing a bunch of new questions even as it solves the current problem.
@zzzeek Do you have any comments on how we could approach using a separate metadata per __bind_key__?
hey -
I haven't looked at the details but the general problem, each class has __bind_key__ on it, and you want to override Session.get_bind() to look at __bind_key__, then look inside a global config somewhere, and return an engine.
From that problem description alone, I don't see how .metadata or .registry should have anything to do with it. Session.get_bind() is passed the Mapper in question, you would look at mapper.class_.__bind_key__, and get your engine. As stated this doesn't have anything to do with the "name" of a class, you get the actual class object.
It seems like the problem is more complicated than that but i would definitely stay away from generating multiple MetaData objects implicitly and there should be no need to override what "registry" is.
It also makes it possible to write foreign keys between models that will end up using separate engines, which won't work. When using plain SQLAlchemy, you would create separate declarative bases and bind each of them to a different engine, but in Flask-SQLAlchemy there is only one db.Model base class.
can you illustrate this? i dont really understand the problem.
Our Session.get_bind code works for basic uses, but seems to break down in a number of cases, and it seems like it's related to metadata. There are a bunch of different issues open about binds: https://github.com/pallets/flask-sqlalchemy/issues?q=is%3Aissue+is%3Aopen+binds, it's hard to summarize. Things with compiler, events, overlapping table names, etc. all seem to be affected by the fact that we're not separating models into different metadatas.
The easiest issue to demonstrate is that since we're using a single db.Model base class that has a single metadata, you can't define two models with the same table name but different bind keys, because the metadata wants unique table names. https://github.com/pallets/flask-sqlalchemy/issues/26#issuecomment-2332423 pointed out that issue back in 2011 right after our binds stuff was added.
class LocalUser(db.Model):
__tablename__ = "user"
class OtherUser(db.Model):
__tablename__ = "user"
__bind_key__ = "other"
These models are in separate databases (default and "other"), but since they have the same table name the metadata raises an error:
sqlalchemy.exc.InvalidRequestError: Table 'genres' is already defined for this MetaData instance. Specify 'extend_existing=True' to redefine options and columns on an existing Table object.
In plain SQLAlchemy, I would make Base = declarative_base() and OtherBase = declarative_base() and inherit from the separate bases, so this wouldn't come up. But we only have the single db.Model base class to inherit from.
oh OK, I see, so you want to likely have metadata per bind_key. This is fine, ~you can assign .metadata to each class directly and that will supersede the .metadata that's on a the registry()~ OK that's a regression, that will be fixed in https://github.com/sqlalchemy/sqlalchemy/issues/6128 . it's a documented use case https://docs.sqlalchemy.org/en/14/orm/declarative_config.html#abstract that was not supposed to break.
That code for generating the table was the one thing I noticed that made me reconsider whether overriding registry or metadata in subclasses was a good idea. I hadn't seen those docs either, good to know that it's a current public API.
What about having two models with the same class name? Maybe I should be replacing the registry too (or _decl_class_registry in 1.3), otherwise lazy loading from string names might pick the wrong model.
# module a
class User(db.Model):
...
# module b
class User(db.Model):
__bind_key__ = "other"
# another model
class Address(db.Model):
user = db.relationship("User") # might pick b.User even though a.User is the correct bind key
allowing non-unique model names seems like a bad idea... i'd rather have people be explicit and name one of the models e.g. OtherUser
If overriding registry isn't allowed that's fine, we can always tell users to make sure their class names are unique regardless of bind, which they should probably do anyway. Just figured I'd mention it since it's a similar issue. It might be that that already does/doesn't work, I'm not super familiar with how all the declarative classes get registered.
It's related to the other problem I briefly mentioned where you can define foreign keys across binds that won't break until you try to query.
the older declarative base and the registry both qualify classes by their module name so you can have same-named classes in different modules.
Thanks for all your help, I'm much more confident about implementing this now.
have any progress ?
any progress on this issue?
Hey there, I just wanted to add that using a single metadata for multiple binds causes issues also with some data types that subscribe to its events.
An example is Enum, which registers two callbacks to create and drop the data type on metadata's before_create and after_drop events.
As a result, when two tables with different binds use such types, db.drop_all() stops working because each time it drops all the tables of a bind (src) the after_drop event on the metadata is emitted, and we try to drop data types that are still in use by the tables of the other binds.