factory_boy icon indicating copy to clipboard operation
factory_boy copied to clipboard

Use Pydantic models as the model in Meta

Open jaraqueffdc opened this issue 3 years ago • 9 comments

The problem

Pydantic is a fast growing library that handles data validation in a very clean way using type hinting. If you are working on a python project that needs to digest and output data models you are likely to use pydantic these days, even more so if you are using fastapi since it uses pydantic to validate json objects by default.

Factory_boy handles different types of schemas but it does not handle JSON schema which can be generated from a Pydantic model very easily (at least that I know of). It does not accept pydantic models as a valid model either so we cannot use factory_boy to handle data generation with a very well known library as pydantic.

Proposed solution

It would be great if factory_boy accepted a pydantic model in the Meta class the same way it accepts different ORMs. This would allow to understand the data types and restrictions from the model and generate data appropriately that matches that specific models. As far as I know there is no such a tool that allows this and I am sure it would be very useful for many people.

Extra notes

I have been looking around and I could not find anything similar to this and in fact I saw an issue opened in Pydantic in which the author mentioned factory_boy as a good tool to handle this kind of data generation (as pydantic itself is not intended to that and it would open a different can of worms).

I am not sure if this is something you would be interested in but I am sure that, with the number of people using pydantic either by itself or as part of fastapi, this addition would be very much appreciated.

If there is already a way of producing data model from json schema (which would work as well) please let me know. I have found nothing that does this.

Thanks!

jaraqueffdc avatar Jun 14 '21 07:06 jaraqueffdc

Hi,

That's an interesting idea; however, I'm surprised at the "factory_boy can't feed pydantic models" part: the default Factory class will simply build a set of kwargs from the declarations, and pass them to the model's __init__ method.

If you can do User(id=123, name="John Doe"), then you can write the following factory:

class UserFactory(factory.Factory):
    class Meta:
        model = User
    id = factory.Sequence()
    name = factory.Faker("name")

With that factory, calling UserFactory() will build a declaration dict (let's say {"id": 1, "name": "John Doe"}), and pass it to User.__init__: User(id=1, name="John Doe").

Wouldn't that work with pydantic? I'm not familiar with the library.

rbarrois avatar Jun 14 '21 16:06 rbarrois

Probably would although I might be in the wrong here but would we need to generate this for every field? Some objects can get very complicated with nested objects inside and I thought that given a model definition would allow some kind of automation in the generation. It would be great if one could just leverage the model definitions to get some level of mocking out of the box (i.e. look at the type generate values accordingly with those types) and for those cases where we do care about the values define them properly in the factory. Some of the typical objects we could work with pydantic are json responses from apis and depending on the api there might hundreds of fields to track.

Generating the model from pydantic from such big json responses is rather straight forward because it has automatic code generation based on the content of the json response but with the factory we would need to manually define each field. Is this right? Sorry if I am going a bit off topic, I just wanted to be sure I understand your proposal and making sure that I am not getting wrong some of the capabilities of the package.

I might give this a go and let you know if it works properly with Pydantic objects.

jaraqueffdc avatar Jun 15 '21 06:06 jaraqueffdc

Currently, factory_boy requires you to provide a "recipe" for building the object; that way, you can tell it, for instance, that start_date should be less than end_date, or have the recipe vary on a "trait": "Give me an active user (UserFactory(active=True))" might yield a different set of attributes from "Give me a pending user (UserFactory(pending=True))".

Adding automated factory declaration from the metadata on the model is a future goal (see #836), but the required API design hasn't been done yet.

rbarrois avatar Jun 15 '21 13:06 rbarrois

Sure, I understand the decision behind it and makes sense. It would be awesome to have that feature goal for huge objects and leave those fields we do not want to define being generated. When we have objects like api responses we could not care less about many of them... but we need them the same to ensure we are handling the object properly. It is good to see this is being considered though and I might take a look at it if I have some spare time as well!

About the Pydantic feature, I have been testing it during yesterday and I think it just works :) I might stump upon some issue when I start using it more in depth but the fact that it just generates a dictionary that is passed to the constructor which is really what pydantic looks for when manually creating an object seems to be doing the trick.

I think the only thing I am missing is the feature you mentioned was already a goal and another one would be something I might do some work on which is creating a factory from a pydantic model with automatic code generation. That would simplify a lot creating factories for them when the object is rather big.

I would let you know if I find some uses cases where this breaks but it seems really good so far , thanks!

jaraqueffdc avatar Jun 16 '21 07:06 jaraqueffdc

That's an interesting idea; however, I'm surprised at the "factory_boy can't feed pydantic models" part: the default Factory class will simply build a set of kwargs from the declarations, and pass them to the model's __init__ method.

If you can do User(id=123, name="John Doe"), then you can write the following factory:

class UserFactory(factory.Factory):
    class Meta:
        model = User
    id = factory.Sequence()
    name = factory.Faker("name")

With that factory, calling UserFactory() will build a declaration dict (let's say {"id": 1, "name": "John Doe"}), and pass it to User.__init__: User(id=1, name="John Doe").

Wouldn't that work with pydantic? I'm not familiar with the library.

You can do that with Pydantic, and it's worked relatively well in my testing so far. The only place I've personally run into issues is when I'm nesting the models using a SubFactory. Pydantic doesn't seem to like receiving pre-instantiated Pydantic model-objects as keyword arguments. The work around I'm currently using looks something like:

import factory as fb
from functools import partial
from typing import Any, Optional, Type


def pydantic_subfactory(factory: Type[fb.Factory], **kwargs: Optional[Any]) -> fb.LazyFunction:
    """Emulate the behavior of a SubFactory in a Pydantic-compatible way."""
    return fb.LazyFunction(partial(fb.build, dict, FACTORY_CLASS=factory, **kwargs))


PydanticSubFactory = pydantic_subfactory

I haven't had any trouble with it so far, but I'm also not exactly using it in the most complex fashion I could be.

Personally, I'd like to see something like the SQLAlchemy-specific factory class implemented for Pydantic that behaves identically to the current factory class but ensures that uses of SubFactory or RelatedFactory pass their built objects to the model constructor as a dict instead of as am actual constructed model. If I can find time to work on it, I'll write the subclass myself and make a PR, but if it's something other people need then I would caution them not to wait on me to do it.

the-wondersmith avatar Oct 04 '21 18:10 the-wondersmith

Use pydantic-factories.

conradogarciaberrotaran avatar Apr 01 '22 13:04 conradogarciaberrotaran

I've spent a bit of time evaluating the pydantic-factories package mentioned here. From what I can see, the package is not well integrated with FactoryBoy.

From what I can see, pydantic-factories:

  • does not support overriding field values using calls to factoryboy, e.g. first_name = factory.Faker('first_name')
  • does not integrate with factoryboy randomness seeding or with its Faker instance (most values are populated with raw calls to the random package)
  • has almost no connection to factoryboy, other than the fact that its build function is called build() and factories are invoked in roughly the same way

I think pydantic-factories has headed in a different direction and should not be considered as obviating the need for an official factoryboy solution.

billhunekepf avatar Feb 02 '23 20:02 billhunekepf

Please note that since the previous comment, pydantic-factories evolved to polyfactory: https://polyfactory.litestar.dev/. That new library is providing most of missing features listed above (AFAIK) and has dedicated factories for Pydantic models. I think it might have all what is required by OP plus, most important IMO, typing annotations. You should give it a look.

This might close OP

g0di avatar Sep 13 '23 07:09 g0di

I'm not 100% certain, but I think this functionality is natively supported (Although not explicit like using factory.django.DjangoModelFactory):

class Account(BaseModel):
    id: int
    name: Optional[str]
    is_active: bool
    created_at: datetime
    updated_at: datetime
    azure_partner_customer: Optional[CHAzurePartnerCustomer]


class AccountFactory(factory.DictFactory):
    class Meta:
        model = Account

    id = factory.Faker("pyint")
    name = factory.Faker("name")
    is_active = factory.Faker("pybool")
    created_at = factory.Faker("date_time")
    updated_at = factory.Faker("date_time")

AccountFactory()

It seems to be working fine for me, if I exclude fields within the factory then I get a pydantic ValidationError being raised which seems to suggest it's doing what we want.

alex-way avatar Sep 15 '23 10:09 alex-way