factory_boy icon indicating copy to clipboard operation
factory_boy copied to clipboard

Difficult and confusing to create randomized models

Open atpjpta opened this issue 3 years ago • 2 comments

The problem

I found it very hard to randomize models, and even harder to randomize models based on input parameters. I have finally arrived at a solution that basically makes every attribute of my factories a factory.LazyAttribute that calls some function to compute random values (using a faker and homebrew mashup). These functions sometimes take nothing, sometimes they take a parameter from Params that I can pass in through the LazyAttribute's input lambda.

Here is a snippet of what I've done, for an example:

import factory
import string
import random
from factory.django import DjangoModelFactory
from factory.faker import faker
from django.contrib.auth.models import User

fake = faker.Faker()

def get_first_name_from_gender(gender):
    if gender == 'male':
        first_name = fake.first_name_male()
    elif gender == 'female':
        first_name = fake.first_name_female()
    elif gender == 'non-binary':
        first_name = fake.first_name_nonbinary()
    else:
        raise ValueError('Input gender must be male, female, or non-binary.')

    return first_name

def random_email_generator():
    # username/email must be unique, so we need to generate a random one that likely will not exist elsewhere
    num_chars = 24
    char_options = string.ascii_lowercase + string.digits
    email = "".join(random.choices(char_options, k=num_chars))
    email += '@example.com'
    return email

class UserFactory(DjangoModelFactory):
    class Meta:
        model = User

    class Params:
        gender = 'male'

    username = factory.LazyAttribute(lambda o: random_email_generator())
    email = username
    first_name = factory.LazyAttribute(lambda o: get_first_name_from_gender(o.gender))

    # set the user's password to "password" after generation using the set_password method to ensure its properly hashed in database
    password = factory.PostGenerationMethodCall('set_password', 'password')

This is a factory where I wanted to base the user's name off of their gender. I wanted the ability to specify a fake user's genders externally, hence the gender param.

The reason it was so hard to add randomization seems to be the design choice of making all model fields class attributes of the model factory class. Because of this, trying to assign fields to random values right in the class (and as described in the docs) doesn't work. If you do that, they are computed at import time and fixed for the rest of that python session. Using (or abusing?) lazy attribute as I've described above seems to be the only path forward.

Proposed solution

I think that it would be more pythonic, make usage more intuitive, and reduce the learning curve if there was some function of DjangoModelFactory and other factory classes that child classes could override, something like build_model(), that is called to construct the model every time one is requested. If fields were then assigned in this function, one could write regular python with random or np.random to easily create randomized fields.

Extra notes

Could you please comment on the design decision to make all model fields class attributes of the model factory class? I feel like you must have had a good reason for approaching the problem this way, but I can't understand why.

One more note, I understand the intended use case for factory_boy is for generating test fixtures. I suppose I am somewhat abusing it in the first place, as I am using it to try to fill my development database with plenty of random but semi-realistic data. I'm an experienced developer, but am very new to web development/django and began using factory_boy for this purpose based on this blog post https://mattsegal.dev/django-factoryboy-dummy-data.html. If you think this is a bad idea, or know a better library that is intended for this purpose, I'd be very grateful for any advice you might have to offer. Thanks!

atpjpta avatar Apr 25 '21 22:04 atpjpta

One more note, I understand the intended use case for factory_boy is for generating test fixtures. I suppose I am somewhat abusing it in the first place, as I am using it to try to fill my development database with plenty of random but semi-realistic data.

I have been doing this for years. I don't think this is an abuse at all.

The major struggle I have faced is that I try to reuse factories for both test fixtures and initial development setup data. But since most of the time factories declarations are tailored for test expressiveness at the expense of data sense, the script loading initial data grows into a somewhat unstructured and coupled spaghetti script.

But I still think it is a successful approach

n1ngu avatar Sep 16 '22 08:09 n1ngu

The function that recalls what you are asking would be the instantiate method from the options class. You could always override it like

import factory.django

class HackyOptions(factory.django.DjangoOptions):
    def instantiate(self, step, args, kwargs):
        # ???
        obj = super().instantiate(self, step, args, kwargs)
        # ???
        return obj

class UserFactory(factory.django.DjangoModelFactory):
    _options_class = HackyOptions
    class Meta:
        model = User
    # ...

This is not deliberately hidden, and is useful to gain control on complex instance generation details, but I think this is just not expected to be used by regular factory_boy users. factory_boy aims at providing utilities for declarative and composable factories, if you are going the imperative way you might find less trouble by... not using factory_boy!

I believe what you want to achieve can be declaratively done via fakers, sequences (for username uniqueness and whatnot), and traits and "maybe" declarations instead of those functions. It is just an alternative paradigm.

n1ngu avatar Sep 16 '22 22:09 n1ngu