factory_boy icon indicating copy to clipboard operation
factory_boy copied to clipboard

support for object generation from marshmallow schema

Open jo-tham opened this issue 8 years ago • 18 comments

Hi,

is there any interest in adding support for generating objects based on marshmallow schema? What would the main steps be?

https://github.com/marshmallow-code/marshmallow

perhaps it is practical to use the ORM backend and simply constrain the fields to those used in the marschmallow schema...

jo-tham avatar Mar 01 '16 18:03 jo-tham

I started using marshmallow recently, and very happy with it, so I certainly wouldn't be opposed to adding support.

However, I'm not clear what the use case is here... can you elaborate?

jeffwidman avatar Mar 01 '16 18:03 jeffwidman

Thanks, Jeff

to elaborate:

I have a flast-restful api which uses marshmallow schema to serialize and preprocess ORM objects. http://marshmallow-jsonapi.readthedocs.org/en/latest/quickstart.html#flask-integration (not worried about the jsonapi-part; it's easy enough to wrap valid data in the jsonapi format)

I am writing tests for the api using requests and a running instance of the application.

I'm currently writing data fixtures by hand for use in POST requests. It would be better to generate these fixtures.

side benefit - since marshmallow is agnostic about the objects it serializes, it might be useful as a general schema for generating objects, too. people could have schemas for their fixtures separate from the types of objects they are dealing with (e.g. mongoengine vs sqlalchemy)

Does that make sense? What do you think?

I also thought it might fit as module in marshmallow or a library unto itself integrating factory_boy and marshmallow.

jo-tham avatar Mar 01 '16 18:03 jo-tham

Hi,

marshmallow looks quite interesting indeed :)

Building adapters for factory_boy is (hopefully) rather easy; regarding marshmallow, how would you integrate them? I find it helpful to start writing down usage examples: this helps to clarify the problem we're solving ;)

So, how would you want to call a factory related to a marshmallow schema? :-)

rbarrois avatar Mar 01 '16 21:03 rbarrois

thanks for the input @rbarrois!

i will make some time this evening to create a couple usage examples. I will also take a look at the factory_boy modules/api and add some comments about making the integration.

jo-tham avatar Mar 01 '16 22:03 jo-tham

Here's a hypothetical usage

import datetime as dt
from marshmallow import Schema, fields
from factory_boy import factory

# serializes any object to dictionary based on class attributes
class UserSchema(Schema):
    username = fields.Str()
    joined_at = fields.DateTime()
    password = fields.Str(load_only=True)


class UserSchemaDataFactory(factory.marshmallow.MarshmallowSchemaFactory):
    class Meta:
        model = UserSchema

    username = factory.Faker('username')
    joined_at = factory.Faker('datetime')
    password = factory.Faker('word')


user = UserSchemaDataFactory.stub()

print(user)
# {
#   'username': 'morpheus2',
#   'joined_at': '2016-03-01 21:16:06.748186',
#   'password': 'auspcicious'
# }

type(d)
# <class 'dict'>

But it looks like the desired outcome is possible via outputting factory to a dict.

factory.build(dict, FACTORY_CLASS=UserSchemaDataFactory)

What I really hoped was to introspect on the Meta model to bypass the need to declare attributes of the factory class, i.e.

class UserSchemaDataFactory(factory.marshmallow.MarshmallowSchemaFactory):
    class Meta:
        model = UserDataSchema

user = UserSchemaDataFactory.stub()

I thought this existed for the ORM handlers in factory_boy but it's not the case.

It seems to in mixer which was the other package I was considering for generating fixtures

jo-tham avatar Mar 02 '16 05:03 jo-tham

Sorry, we just had a baby so I've been short on time/sleep.

I hit a similar issue as you, and ultimately ended up just using the dict recipe. I found that more effective actually because I needed to manually specify which fields to include in my test fixtures to make sure I'd properly set Marshallow's load_only/dump_only attributes on different fields.

Ultimately, it sounds like the feature request here is ORM class introspection to generate field types automatically... nothing Marshmallow specific.

Not sure how @rbarrois feels about this, but I'm hesitant to add this because I'm not convinced it would add enough value to be worth the additional maintenance.

When I look at my projects, for the same SQLAlchemy field types I've used multiple factory_boy generators depending on foreign keys, custom field constraints, etc. I want my fake data to mirror the real data as much as possible, so I often tweak the faker generators.

There's other random problems when trying to magically guess the correct fake data. For example, faker has a limited set of words in the lorem ipsum dataset, and I ran into problems trying to use this for fake data that needed to be unique... I had to tack on a random character.

So I'm just not convinced there's enough of a 1:1 mapping between field types and factory_boy/faker to be worth writing/maintaining the introspection code. This holds true at least for SQLAlchemy, perhaps it's different for Django.

Again, I'm running on relatively little sleep, so if I'm overlooking anything obvious here, feel free to point it out.

PS:

I am writing tests for the api using requests and a running instance of the application.

You can skip requests and just use the built-in Flask test client which is just a wrapper around Werkzeug's test client. It's how I test my API, and it works perfectly. Makes it easy to push a test_request context, etc.

jeffwidman avatar Mar 21 '16 21:03 jeffwidman

Well, model introspection is a work-in-progress dating back one year ago, see here: https://github.com/rbarrois/factory_boy/commit/4046c55710d5d7073018dcc76aa3e8e5a7f803eb

It still needs some work (more robust code, more tests, lots more docs).

If someone is interested in helping there, let's go!

rbarrois avatar Apr 06 '16 22:04 rbarrois

Sorry for delay gentleman, I started a new job this week and was wrapping up projects for clients before that.

I still need to review mixer and see if it's a better fit for marshmallow schema. If mixer doesn't look more convenient/robust, I'd love to pick up on 4046c55 (and if marshmallow doesn't fit in factory boy I can make a separate lib for generating from marshmallow). It'll be a few weeks until I settle in to the new job and have time to evaluate these things. Glad to assist in review if anyone gets to it first.

jo-tham avatar Apr 07 '16 00:04 jo-tham

@jo-tham Awesome! Let us know if you need any help on this, or if you find shortcomings in factory_boy that cause you to discard it ;)

rbarrois avatar Apr 18 '16 21:04 rbarrois

@rbarrois I have been working on getting your branch 4046c55 working with the current master, however have a design question:

For me, the main desire for an automatic generator is that I want to be able to create a record with the minimal amount of data necessary to allow it to save. Fields that have defaults, or can be blank/null generally don't need a factory boy definition.

For example, if you have a CharField(max_length=20,blank=True,null=False) then django will already initialise this field to '' by default. Similarly, IntegerField(default=0,null=False) doesn't need a definition.

Would you have an objection with me changing the behaviour (for django -- would actually be up to each introspector to decide what is necessary) to only generate data where it's actually necessary in order to be able to save the record?

levic avatar Sep 08 '17 04:09 levic

To clarify: this is when determining what fields to auto generate; if you specifically include a field in the list of fields to generate then it would still use fuzzy/faker to generate a random value

levic avatar Sep 08 '17 05:09 levic

@levic wow, awesome!!

Your suggestion looks good to me; and we might still add an option to say fields = ['*'] to force generating a relevant faker/fuzzy for each field.

rbarrois avatar Sep 08 '17 09:09 rbarrois

I have essentially completed it at https://github.com/levic/factory_boy/commits/wip/auto_factory I wasn't planning to do a pull request until I've used it for a week in a production project in case the test cases missed something (unless you want to replace the wip/auto_factory branch in the main repo)

General Changes

  • It will now look at null/blank/default and only create default field definitions when it looks like full_clean() or save() will fail
  • If field.choices is set then it will randomly pick from the available choices regardless of the field type
  • Removed conditional support for django < 1.8 (1.7 ended maintenance 2 years ago)
  • Removed code you wrote for OptionDefault parent dict merging, and low/high fuzzy number generation that wasn't used. I extracted them out as the last 2 commits in [this branch] (https://github.com/levic/factory_boy/commits/wip/auto_factory_extra), they're not in the main auto_factory branch

Factory Interface changes

  • auto_fields has been split into default_auto_fields/include_auto_fields/exclude_auto_fields
class MyFactory:
        class Meta:
            model = MyModel
            default_auto_fields = True
            include_auto_fields = ['extra_field1', 'extra_field2']
            exclude_auto_fields = ['ignore_this_field']

  • default_auto_fields - if True then the default set of fields will be included
  • include_auto_fields - tells the introspector to additionally autogenerate definitions for these fields. If default_auto_fields is False then fields listed in include_auto_fields will still be included
  • exclude_auto_fields - tells the introspector to never autogenerate definitions for these fields
  • The rationale behind this is to allow for easier inheriting of settings for abstract factories (eg to have a base class in your application that has a blacklist of fields that are always included/excluded regardless of the model).

Introspector Interface Changes

  • In my particular use case I want to first of all look at field names and only if these don't match then fall back to looking at field types. This was possible before but was a bit clumsy. I've changed the interface so that build_declaration is easier to override and do this (more of the logic that you shouldn't need to override is now in build_declarations -- here )

Including all Fields

  • I looked at making '*' an option to include_auto_fields, but there are still fields you don't want to include (eg reverse foreign keys, AutoField).
    • Instead I made a variant introspector; see the code here for an example. It is not as succinct as just including '*' but it substantially simplifies the internal implementation.

Outstanding Issues

  • I have looked at the logic in model mommy and added test cases for issues that they came across, but haven't tested generic relations (there is code in there but it may or may not actually work)
  • No documentation update. Will try to do this sometime in the next week or two before the pull request

levic avatar Sep 09 '17 03:09 levic

Hey @jo-tham, I just ran into this

I am writing tests for the api using requests and a running instance of the application.

I'm currently writing data fixtures by hand for use in POST requests. It would be better to generate these fixtures.

exact use case, and was wondering if there is any update regarding the ability to generate fake data from a marshmallow schema. Is this currently possible without having to resort to a SQLAlchemy or Django model?

EDIT: the reason I do not want to use an ORM model is because I'm trying to generate data for ElasticSearch documents, and not the DB layer. Would be interested to know if there are other tools / options available at the moment to do this.

SirR4T avatar Feb 04 '19 06:02 SirR4T

Hey All,

I use factory-boy for Marshmallows and it's great. In testing I sometimes want the marshmallow's JSON representation and other times I need the object version of the data. To accomplish this I've created a mix-in to easily make both types.

Example Schema:

# coding=utf-8
from marshmallow import Schema
from marshmallow import fields


class DemographicsSchema(Schema):
    """ User Info """
    first_name = fields.String()
    last_name = fields.String()
    dob = fields.Date()

    # Address
    street1 = fields.String()
    street2 = fields.String()
    city = fields.String()
    state = fields.String()
    zip_code = fields.String()

Example Factory


# ------------------------------------------
#   Demographics
# ------------------------------------------
class _DemographicsFactory(factory.Factory):
    class Meta:
        model = DemographicsSchema

    first_name = factory.Faker('first_name')
    last_name = factory.Faker('last_name')
    dob = factory.Faker('date_time_this_century', before_now=True)

    # Address
    street1 = factory.Faker('address')
    street2 = factory.Faker('secondary_address')
    city = factory.Faker('city')
    state = factory.Faker('state_abbr')
    zip_code = factory.Faker('zipcode')


class DemographicsStrFactory(_DemographicsFactory, JSONFactoryMixin):
    """ Creates JSON Serialized model of the factory data """


class DemographicsObjFactory(_DemographicsFactory, ObjFactoryMixin):
    """ Creates Deserialized model of the factory data """

My Mixins

# coding=utf-8
import factory


class JSONFactoryMixin(factory.Factory):
    """ Overwrites Factory._create() to produce JSON serialized models """

    @classmethod
    def _create(cls, model_class, *args, **kwargs):
        """Override the default ``_create`` with our custom call."""
        schema = model_class()
        results = schema.dumps(kwargs)
        assert not results.errors

        return results.data


class ObjFactoryMixin(factory.Factory):
    """ Overwrites Factory._create() to produce deserialized models """

    @classmethod
    def _create(cls, model_class, *args, **kwargs):
        """Override the default ``_create`` with our custom call."""
        schema = model_class()
        results = schema.dump(kwargs)
        assert not results.errors

        return results.data

Example of Use

# imagine we need the serialized version of this model
demographics_json = DemographicsStrFactory()

# if we need the de-serialized version
demographics_dict = DemographicsObjFactory()

etiology avatar Feb 04 '19 08:02 etiology

Has any consensus been reached about the AutoFactory approach? @rbarrois @levic This would be really nice functionality to have to make it less work/verbose to define simple schemas.

simonvanderveldt avatar Feb 22 '19 16:02 simonvanderveldt

@simonvanderveldt I'm still very much in favour of this idea, and had designed a couple of drafts in the past. The main obstacles are:

  1. Finding the time to implement a working proposal for the core engine
  2. Designing a clean API which would allow to plug custom models/fields/rules for each project
  3. Writing a comprehensive documentation and related tests.

Also, we could make this idea easier to find by adding a dedicated issue in the tracker :wink:

rbarrois avatar Feb 24 '19 22:02 rbarrois

👋 I just saw this issue and I've taken this work back:

  • https://github.com/levic/factory_boy/commits/wip/auto_factory
  • https://github.com/FactoryBoy/factory_boy/tree/wip/auto_factory

I've created a draft PR https://github.com/FactoryBoy/factory_boy/pull/822 @rbarrois can I create a different issue and close this one in order to discuss the points you mentioned above ?

arthurHamon2 avatar Nov 23 '20 17:11 arthurHamon2