factory_boy icon indicating copy to clipboard operation
factory_boy copied to clipboard

Memory leak of SQLAlchemy instances passed as argument to factories

Open bbc2 opened this issue 5 years ago • 2 comments

Description

Factory_boy keeps SQLAlchemy instances passed as argument to factories in memory, causing a memory leak and changing what requests SQLAlchemy performs.

The changed behavior for SQLAlchemy, in our case, is additional SELECT statements. That's an issue for us because we count SQL queries in tests to ensure the absence of typical SQL bugs such as the n+1-query problem.

That being said, I would expect the memory leak to potentially cause other issues.

To Reproduce

Run the following with factory_boy 3.0.0 or 3.0.1:

import factory
from sqlalchemy import Column, ForeignKey, Integer, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, sessionmaker

Base = declarative_base()

class Org(Base):
    __tablename__ = "orgs"

    id = Column(Integer, primary_key=True)


class User(Base):
    __tablename__ = "users"

    id = Column(Integer, primary_key=True)
    org_id = Column(Integer, ForeignKey("orgs.id"))
    org = relationship(Org)


engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()


class UserFactory(factory.alchemy.SQLAlchemyModelFactory):
    class Meta:
        model = User
        sqlalchemy_session = session


def main():
    UserFactory(org=Org(id=1))
    session.commit()

    engine.echo = "debug"
    session.query(Org).filter_by(id=1).delete()
    session.commit()


if __name__ == "__main__":
    main()

When running this, you'll see an unexpected SELECT statement performed on the orgs table:

2020-09-16 15:15:20,656 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2020-09-16 15:15:20,656 INFO sqlalchemy.engine.base.Engine SELECT orgs.id AS orgs_id
FROM orgs
WHERE orgs.id = ?
2020-09-16 15:15:20,656 INFO sqlalchemy.engine.base.Engine (1,)
2020-09-16 15:15:20,656 DEBUG sqlalchemy.engine.base.Engine Col ('orgs_id',)
2020-09-16 15:15:20,656 DEBUG sqlalchemy.engine.base.Engine Row (1,)
2020-09-16 15:15:20,657 INFO sqlalchemy.engine.base.Engine DELETE FROM orgs WHERE orgs.id = ?
2020-09-16 15:15:20,657 INFO sqlalchemy.engine.base.Engine (1,)
2020-09-16 15:15:20,657 INFO sqlalchemy.engine.base.Engine COMMIT

The expected output is the following:

2020-09-16 15:18:11,054 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2020-09-16 15:18:11,054 INFO sqlalchemy.engine.base.Engine DELETE FROM orgs WHERE orgs.id = ?
2020-09-16 15:18:11,054 INFO sqlalchemy.engine.base.Engine (1,)
2020-09-16 15:18:11,054 INFO sqlalchemy.engine.base.Engine COMMIT

To see the memory leak, replace main by:

def main():
    import gc
    import weakref

    org=Org(id=1)
    org_weak = weakref.ref(org)
    UserFactory(org=org)
    session.commit()
    del org
    gc.collect()
    print(org_weak)

You'll see the following output:

<weakref at 0x7fd11dba6b30; to 'Org' at 0x7fd11dd7cb80>

That shows that Org is still in memory although it should have been collected.

Expected output:

<weakref at 0x7f34150d2c20; dead>

Notes

I bisected the problem to PR #658. More precisely, it seems to be the assignment cls._original_params = params in the _generate method that causes the memory leak. Therefore, the problem was introduced in version 3.0.0.

I suspect the problem is also present in the Django model factory.

bbc2 avatar Sep 16 '20 13:09 bbc2

Hi! Thanks a lot for the bug report; this is indeed a nasty bug.

The memory leak should indeed be present with Django as well, but won't have as big an impact there. I'll look into it.

rbarrois avatar Oct 03 '20 09:10 rbarrois

I tested this again but with more recent versions of SQLAlchemy and found the following:

  • The memory leak seems to still exist with any version of SQLAlchemy and factory_boy >= 3.
  • The extra SELECT disappears with SQLAlchemy >= 1.3.21 and any version of factory_boy.
    • I suspect the fix is https://docs.sqlalchemy.org/en/14/changelog/changelog_13.html#change-377e70b3f06fb4b100c8a4182baf138c but I'm not certain.

As a result, the impact of this issue is much lower for us, since we only use factory_boy in tests, but I suppose the extra memory reference could still be an issue for other users.

bbc2 avatar Aug 03 '22 11:08 bbc2