factory_boy
factory_boy copied to clipboard
Add method create_minimal() that creates minimal number of database objects
Database scenarios only
This improvement suggestion refers only when Factory-boy is used in database scenarios - i.e. SQLAlchemyModelFactory/Django... etc.
create() creates complete tree of subobjects
When calling Factory.create()
it creates complete tree of subobjects, what can be suboptimal and very slow in complex environments. Simple example:
class CityFactory(SQLAlchemyModelFactory):
name = ...
class AddressFactory(SQLAlchemyModelFactory):
street = ...
city = SubFactory(CityFactory)
class PersonFactory(SQLAlchemyModelFactory):
full_name = ...
address = SubFactory(AddressFactory)
address_job = SubFactory(AddressFactory) # nullable
address_alt = SubFactory(AddressFactory) # nullable
NOTE: In this case both, address_job and address_alt are nullable fields in db table.
Calling:
person1 = PersonFactory.create()
person2 = PersonFactory.create()
For each person we get full tree created:
person1 = new Person(
|- address = new Address(
| |- new City())
|- address_job = new Address(
| |- new City())
|- address_alt = new Address(
| |- new City()))
person2 = new Person (
|- address = new Address(
| |- new City())
|- address_job = new Address(
| |- new City())
|- address_alt = new Address(
| |- new City()))
What will issue 2 x 7 SQL INSERT commands. When we have much more complex system with e.g. 10 level deep SubFactory trees and numerous objects - hundreds of SQL statements are being called and new objects inserted causing very slow performance.
Sometimes creating full tree is exactly what we want, but in some cases it is not. Examples:
- for nullable fields - in some cases I want to create and in some cases I don't want to create such sub-objects (without removing SubFactory declaration)
- in some cases I am not interested in having distinct and dedicated subobjects in tree - just take any existing object in database and assign to newly created
Proposed solution: introduce new create_minimal method and some additional Subfactory arguments
Introduce new create_minimal
method and some additional Subfactory arguments - that could work like this:
- new method would create base object (depth=0) allways
- if
SubFactory
orFaker
receivescreate_allways=True
parameter - then it will be created increate_minimal
mode too - if model column is nullable - do not call
Faker
orSubFactory
creation logic, return None - for all
SubFactory
objects in whole tree - if objects need to be created, try to search if there is any object in database, if there is, take last(order by id desc).first()
, if not, then create
I will use previous example that demonstrates SubFactory tree creation in standard case. I want to have new persons created and I don't care for any address. For illustration purposes I will make small modification: although address_job
is nullable, I want to be created in create_minimal
mode too.
class PersonFactory(SQLAlchemyModelFactory):
full_name = ...
address = SubFactory(AddressFactory)
address_job = SubFactory(AddressFactory, create_allways=True) # nullable
address_alt = SubFactory(AddressFactory) # nullable
Calling new method:
person1 = PersonFactory.create_minimal()
person2 = PersonFactory.create_minimal()
In this case we get this:
person1 = new Person(
|- address = new Address(
| |- new City())
|- address_job = new Address( # create_allways=True
| |- person1.address.city # will reuse just created City object
|- address_alt = None # since it is nullable
person2 = new Person(
|- address = person1.address_job # last created
|- address_job = new Address( # create_allways=True
| |- person1.address.city # will reuse last creted City object
|- address_alt = None # since it is nullable
What will issue 4 + 2 INSERT commands (instead of 7+7 in standard create()
). For each new person it will issue only 2 INSERT commands (instead of 7 for normal create()
).
In much more complex environment this will produce decrease number of SQL statements drastically and speed.
Implementation suggestion
Implementation could have logic like this:
class SubFactory:
...
def evaluate_pre(self, *args, **kwargs):
if self.is_mode_minimal() and not self.create_allways:
if self.is_model_field_nullable():
return None
obj = self.get_database_instance_first_or_none()
else:
obj = None
if not obj:
obj = super().evaluate_pre(*args, **kwargs)
return obj
Note: the strategy which object to take from database could be parametrized, e.g. take last created, take first, take any, take from owner, take from owner's owner, take from my internal object's cache ...
Alternative solution
If you don't like the idea, alternative could be - provide extra callback functions in create() method so user in callback can take decide wheather to create or not to create object.
Example:
def custom_hook(all-needed-args):
if depth > 3 and field.is_nullable:
return None
if factory == CityFactory:
return session.query(City).get_first_or_none()
return PreCreateHookStrategy.CREATE_OBJECT
Factory.create(pre_create_hook=custom_hook, ...)
Thanks for taking the time to write the proposal. I don’t see much improvement over the existing (and simpler IMO) alternative of subclassing factories. One can already write:
class CityFactory(SQLAlchemyModelFactory):
name = ...
class AddressFactory(SQLAlchemyModelFactory):
street = ...
city = SubFactory(CityFactory)
class PersonFactory(SQLAlchemyModelFactory):
full_name = ...
address = SubFactory(AddressFactory)
class PersonWithAddressFactory(PersonFactory):
address_job = SubFactory(AddressFactory) # nullable
address_alt = SubFactory(AddressFactory) # nullable
Also, Trait
s offer a mean of tweaking what data is generated for an individual factory.