Memory leak of SQLAlchemy instances passed as argument to factories
Description
Factory_boy keeps SQLAlchemy instances passed as argument to factories in memory, causing a memory leak and changing what requests SQLAlchemy performs.
The changed behavior for SQLAlchemy, in our case, is additional SELECT statements. That's an issue for us because we count SQL queries in tests to ensure the absence of typical SQL bugs such as the n+1-query problem.
That being said, I would expect the memory leak to potentially cause other issues.
To Reproduce
Run the following with factory_boy 3.0.0 or 3.0.1:
import factory
from sqlalchemy import Column, ForeignKey, Integer, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, sessionmaker
Base = declarative_base()
class Org(Base):
__tablename__ = "orgs"
id = Column(Integer, primary_key=True)
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
org_id = Column(Integer, ForeignKey("orgs.id"))
org = relationship(Org)
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
class UserFactory(factory.alchemy.SQLAlchemyModelFactory):
class Meta:
model = User
sqlalchemy_session = session
def main():
UserFactory(org=Org(id=1))
session.commit()
engine.echo = "debug"
session.query(Org).filter_by(id=1).delete()
session.commit()
if __name__ == "__main__":
main()
When running this, you'll see an unexpected SELECT statement performed on the orgs table:
2020-09-16 15:15:20,656 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2020-09-16 15:15:20,656 INFO sqlalchemy.engine.base.Engine SELECT orgs.id AS orgs_id
FROM orgs
WHERE orgs.id = ?
2020-09-16 15:15:20,656 INFO sqlalchemy.engine.base.Engine (1,)
2020-09-16 15:15:20,656 DEBUG sqlalchemy.engine.base.Engine Col ('orgs_id',)
2020-09-16 15:15:20,656 DEBUG sqlalchemy.engine.base.Engine Row (1,)
2020-09-16 15:15:20,657 INFO sqlalchemy.engine.base.Engine DELETE FROM orgs WHERE orgs.id = ?
2020-09-16 15:15:20,657 INFO sqlalchemy.engine.base.Engine (1,)
2020-09-16 15:15:20,657 INFO sqlalchemy.engine.base.Engine COMMIT
The expected output is the following:
2020-09-16 15:18:11,054 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2020-09-16 15:18:11,054 INFO sqlalchemy.engine.base.Engine DELETE FROM orgs WHERE orgs.id = ?
2020-09-16 15:18:11,054 INFO sqlalchemy.engine.base.Engine (1,)
2020-09-16 15:18:11,054 INFO sqlalchemy.engine.base.Engine COMMIT
To see the memory leak, replace main by:
def main():
import gc
import weakref
org=Org(id=1)
org_weak = weakref.ref(org)
UserFactory(org=org)
session.commit()
del org
gc.collect()
print(org_weak)
You'll see the following output:
<weakref at 0x7fd11dba6b30; to 'Org' at 0x7fd11dd7cb80>
That shows that Org is still in memory although it should have been collected.
Expected output:
<weakref at 0x7f34150d2c20; dead>
Notes
I bisected the problem to PR #658. More precisely, it seems to be the assignment cls._original_params = params in the _generate method that causes the memory leak. Therefore, the problem was introduced in version 3.0.0.
I suspect the problem is also present in the Django model factory.
Hi! Thanks a lot for the bug report; this is indeed a nasty bug.
The memory leak should indeed be present with Django as well, but won't have as big an impact there. I'll look into it.
I tested this again but with more recent versions of SQLAlchemy and found the following:
- The memory leak seems to still exist with any version of SQLAlchemy and factory_boy >= 3.
- The extra
SELECTdisappears with SQLAlchemy >= 1.3.21 and any version of factory_boy.- I suspect the fix is https://docs.sqlalchemy.org/en/14/changelog/changelog_13.html#change-377e70b3f06fb4b100c8a4182baf138c but I'm not certain.
As a result, the impact of this issue is much lower for us, since we only use factory_boy in tests, but I suppose the extra memory reference could still be an issue for other users.