factory_boy
factory_boy copied to clipboard
Difficult and confusing to create randomized models
The problem
I found it very hard to randomize models, and even harder to randomize models based on input parameters. I have finally arrived at a solution that basically makes every attribute of my factories a factory.LazyAttribute
that calls some function to compute random values (using a faker and homebrew mashup). These functions sometimes take nothing, sometimes they take a parameter from Params
that I can pass in through the LazyAttribute
's input lambda.
Here is a snippet of what I've done, for an example:
import factory
import string
import random
from factory.django import DjangoModelFactory
from factory.faker import faker
from django.contrib.auth.models import User
fake = faker.Faker()
def get_first_name_from_gender(gender):
if gender == 'male':
first_name = fake.first_name_male()
elif gender == 'female':
first_name = fake.first_name_female()
elif gender == 'non-binary':
first_name = fake.first_name_nonbinary()
else:
raise ValueError('Input gender must be male, female, or non-binary.')
return first_name
def random_email_generator():
# username/email must be unique, so we need to generate a random one that likely will not exist elsewhere
num_chars = 24
char_options = string.ascii_lowercase + string.digits
email = "".join(random.choices(char_options, k=num_chars))
email += '@example.com'
return email
class UserFactory(DjangoModelFactory):
class Meta:
model = User
class Params:
gender = 'male'
username = factory.LazyAttribute(lambda o: random_email_generator())
email = username
first_name = factory.LazyAttribute(lambda o: get_first_name_from_gender(o.gender))
# set the user's password to "password" after generation using the set_password method to ensure its properly hashed in database
password = factory.PostGenerationMethodCall('set_password', 'password')
This is a factory where I wanted to base the user's name off of their gender. I wanted the ability to specify a fake user's genders externally, hence the gender param.
The reason it was so hard to add randomization seems to be the design choice of making all model fields class attributes of the model factory class. Because of this, trying to assign fields to random values right in the class (and as described in the docs) doesn't work. If you do that, they are computed at import time and fixed for the rest of that python session. Using (or abusing?) lazy attribute as I've described above seems to be the only path forward.
Proposed solution
I think that it would be more pythonic, make usage more intuitive, and reduce the learning curve if there was some function of DjangoModelFactory
and other factory classes that child classes could override, something like build_model()
, that is called to construct the model every time one is requested. If fields were then assigned in this function, one could write regular python with random
or np.random
to easily create randomized fields.
Extra notes
Could you please comment on the design decision to make all model fields class attributes of the model factory class? I feel like you must have had a good reason for approaching the problem this way, but I can't understand why.
One more note, I understand the intended use case for factory_boy
is for generating test fixtures. I suppose I am somewhat abusing it in the first place, as I am using it to try to fill my development database with plenty of random but semi-realistic data. I'm an experienced developer, but am very new to web development/django and began using factory_boy
for this purpose based on this blog post https://mattsegal.dev/django-factoryboy-dummy-data.html. If you think this is a bad idea, or know a better library that is intended for this purpose, I'd be very grateful for any advice you might have to offer. Thanks!
One more note, I understand the intended use case for factory_boy is for generating test fixtures. I suppose I am somewhat abusing it in the first place, as I am using it to try to fill my development database with plenty of random but semi-realistic data.
I have been doing this for years. I don't think this is an abuse at all.
The major struggle I have faced is that I try to reuse factories for both test fixtures and initial development setup data. But since most of the time factories declarations are tailored for test expressiveness at the expense of data sense, the script loading initial data grows into a somewhat unstructured and coupled spaghetti script.
But I still think it is a successful approach
The function that recalls what you are asking would be the instantiate
method from the options class. You could always override it like
import factory.django
class HackyOptions(factory.django.DjangoOptions):
def instantiate(self, step, args, kwargs):
# ???
obj = super().instantiate(self, step, args, kwargs)
# ???
return obj
class UserFactory(factory.django.DjangoModelFactory):
_options_class = HackyOptions
class Meta:
model = User
# ...
This is not deliberately hidden, and is useful to gain control on complex instance generation details, but I think this is just not expected to be used by regular factory_boy users. factory_boy aims at providing utilities for declarative and composable factories, if you are going the imperative way you might find less trouble by... not using factory_boy!
I believe what you want to achieve can be declaratively done via fakers, sequences (for username uniqueness and whatnot), and traits and "maybe" declarations instead of those functions. It is just an alternative paradigm.