factory_boy icon indicating copy to clipboard operation
factory_boy copied to clipboard

Add method create_minimal() that creates minimal number of database objects

Open trebor74hr opened this issue 2 years ago • 1 comments

Database scenarios only

This improvement suggestion refers only when Factory-boy is used in database scenarios - i.e. SQLAlchemyModelFactory/Django... etc.

create() creates complete tree of subobjects

When calling Factory.create() it creates complete tree of subobjects, what can be suboptimal and very slow in complex environments. Simple example:

class CityFactory(SQLAlchemyModelFactory):
   name = ...

class AddressFactory(SQLAlchemyModelFactory):
  street = ...
  city = SubFactory(CityFactory)

class PersonFactory(SQLAlchemyModelFactory):
  full_name = ...
  address = SubFactory(AddressFactory)
  address_job = SubFactory(AddressFactory) # nullable
  address_alt = SubFactory(AddressFactory) # nullable

NOTE: In this case both, address_job and address_alt are nullable fields in db table.

Calling:

person1 = PersonFactory.create()
person2 = PersonFactory.create()

For each person we get full tree created:

person1 = new Person(
 |- address = new Address(
 |   |- new City())
 |- address_job = new Address(
 |   |- new City())
 |- address_alt = new Address(
 |   |- new City()))

person2 = new Person (
 |- address = new Address(
 |   |- new City())
 |- address_job = new Address(
 |   |- new City())
 |- address_alt = new Address(
 |   |- new City()))

What will issue 2 x 7 SQL INSERT commands. When we have much more complex system with e.g. 10 level deep SubFactory trees and numerous objects - hundreds of SQL statements are being called and new objects inserted causing very slow performance.

Sometimes creating full tree is exactly what we want, but in some cases it is not. Examples:

  1. for nullable fields - in some cases I want to create and in some cases I don't want to create such sub-objects (without removing SubFactory declaration)
  2. in some cases I am not interested in having distinct and dedicated subobjects in tree - just take any existing object in database and assign to newly created

Proposed solution: introduce new create_minimal method and some additional Subfactory arguments

Introduce new create_minimal method and some additional Subfactory arguments - that could work like this:

  • new method would create base object (depth=0) allways
  • if SubFactory or Fakerreceives create_allways=True parameter - then it will be created in create_minimalmode too
  • if model column is nullable - do not call Faker or SubFactory creation logic, return None
  • for all SubFactory objects in whole tree - if objects need to be created, try to search if there is any object in database, if there is, take last (order by id desc).first(), if not, then create

I will use previous example that demonstrates SubFactory tree creation in standard case. I want to have new persons created and I don't care for any address. For illustration purposes I will make small modification: although address_job is nullable, I want to be created in create_minimal mode too.

class PersonFactory(SQLAlchemyModelFactory):
  full_name = ...
  address = SubFactory(AddressFactory)
  address_job = SubFactory(AddressFactory, create_allways=True) # nullable
  address_alt = SubFactory(AddressFactory) # nullable

Calling new method:

person1 = PersonFactory.create_minimal()
person2 = PersonFactory.create_minimal()

In this case we get this:

person1 = new Person(
 |- address = new Address(
 |   |- new City())
 |- address_job = new Address( # create_allways=True
 |   |- person1.address.city # will reuse just created City object
 |- address_alt = None # since it is nullable
 
person2  = new Person(
 |- address = person1.address_job # last created
 |- address_job = new Address( # create_allways=True
 |   |- person1.address.city # will reuse last creted City object
 |- address_alt = None # since it is nullable

What will issue 4 + 2 INSERT commands (instead of 7+7 in standard create()). For each new person it will issue only 2 INSERT commands (instead of 7 for normal create()).

In much more complex environment this will produce decrease number of SQL statements drastically and speed.

Implementation suggestion

Implementation could have logic like this:

class SubFactory:
    ...

    def evaluate_pre(self, *args, **kwargs):
        if self.is_mode_minimal() and not self.create_allways: 
            if self.is_model_field_nullable():
                return None
            obj = self.get_database_instance_first_or_none()
        else:
            obj = None

        if not obj:
            obj = super().evaluate_pre(*args, **kwargs)

        return obj

Note: the strategy which object to take from database could be parametrized, e.g. take last created, take first, take any, take from owner, take from owner's owner, take from my internal object's cache ...

Alternative solution

If you don't like the idea, alternative could be - provide extra callback functions in create() method so user in callback can take decide wheather to create or not to create object.

Example:

def custom_hook(all-needed-args):
   if depth > 3 and field.is_nullable:
      return None
  if factory == CityFactory:
     return session.query(City).get_first_or_none()
  return PreCreateHookStrategy.CREATE_OBJECT

Factory.create(pre_create_hook=custom_hook, ...)

trebor74hr avatar Apr 26 '22 12:04 trebor74hr

Thanks for taking the time to write the proposal. I don’t see much improvement over the existing (and simpler IMO) alternative of subclassing factories. One can already write:

class CityFactory(SQLAlchemyModelFactory):
     name = ...


class AddressFactory(SQLAlchemyModelFactory):
    street = ...
    city = SubFactory(CityFactory)


class PersonFactory(SQLAlchemyModelFactory):
    full_name = ...
    address = SubFactory(AddressFactory)


class PersonWithAddressFactory(PersonFactory):
      address_job = SubFactory(AddressFactory) # nullable
      address_alt = SubFactory(AddressFactory) # nullable

Also, Traits offer a mean of tweaking what data is generated for an individual factory.

francoisfreitag avatar Sep 24 '22 10:09 francoisfreitag