factory_boy icon indicating copy to clipboard operation
factory_boy copied to clipboard

The unreleased `Transformer` has a conflicting API with the overall design

Open rbarrois opened this issue 2 years ago • 7 comments

Description

For most declarations, I can override the default mechanism by passing an explicit value:

>>> UserFactory().email
"[email protected]"
>>> UserFactory(email="[email protected]").email
"[email protected]"

If the factory uses a factory.Transformer field, I can't force a value:

>>> UserFactory().password
"pbkdf2_sha256$216000$Gj2b9f0Ln1ms$1dBlLEYrtGiwEA219JU831VAdscD2f9nFY37PrNfCDU="
>>> UserFactory(
...     password="pbkdf2_sha256$216000$Gj2b9f0Ln1ms$1dBlLEYrtGiwEA219JU831VAdscD2f9nFY37PrNfCDU="
... ).password
"pbkdf2_sha256$216000$EUKlFdNViyB5$vcfPaK6H7pDiQAa9ZIbFoj5oj55tUyHKHDUfxI4GIeY="

As a user, it is quite confusing to have very different behaviours for fields; if I have a specific password hash (in this example), I need to have a way to set it.

Next steps

I'm not sure we can officially release the factory.Transformer (and related code) without addressing this topic first; how can a user bypass the transform?

rbarrois avatar Sep 14 '21 11:09 rbarrois

@francoisfreitag As the author of the factory.Transformer feature, I'm quite interested in your opinion here ;)

rbarrois avatar Sep 14 '21 11:09 rbarrois

The use case for Transformer is to apply the transform function to the value. I don’t expect Transformer declarations to be used in cases where the user sometimes want the value to be transformed, sometimes not. A separate helper function would be much clearer for that use case.

I agree an escape hatch would be useful when the vast majority of inputs need transformation, but some special cases are to be tested with a raw value. That’s what RawValue from #888 provides.

francoisfreitag avatar Sep 20 '21 12:09 francoisfreitag

Side note: it felt a bit weird to find it mentioned in the documentation, but then getting a missing import when trying to use it.

Kjir avatar Nov 04 '21 10:11 Kjir

Any progress? If this blocking next release, shouldn't it have the highest priority?

PerchunPak avatar Jul 06 '22 18:07 PerchunPak

I'd say more: I would expect being able to override the declaration with other arbitrary declarations (fakers, self-attributes, lazy-functions, subfactories... anything!), which now will most probably cause a crash because the function is applied to the declaration before it is evaled?

class F(factory.DictFactory):
    name = factory.Transform(str.upper, "jane")

F.build(name=factory.Faker("first_name_female", locale="hi"))  # crash

Would this be covered by the proposed #888 , @francoisfreitag ?


Side note: it felt a bit weird to find it mentioned in the documentation, but then getting a missing import when trying to use it.

@Kjir If you are reading the "latest" documentation page instead of the "stable" you should be expecting this all the time!

n1ngu avatar Sep 17 '22 12:09 n1ngu

Thanks for the input. I did not have that use in mind when developing the transformers. A Transformer takes a value, applies a transform function, then the factory uses the transformed value. The issue here is a value generator is being passed to the factory, not an actual value.

I see the following traceback with your example:

Traceback (most recent call last):
  File "", line 1, in 
  File "/home/freitafr/dev/factory_boy/factory/base.py", line 511, in build
    return cls._generate(enums.BUILD_STRATEGY, kwargs)
  File "/home/freitafr/dev/factory_boy/factory/base.py", line 465, in _generate
    return step.build()
  File "/home/freitafr/dev/factory_boy/factory/builder.py", line 242, in build
    pre, post = parse_declarations(
  File "/home/freitafr/dev/factory_boy/factory/builder.py", line 162, in parse_declarations
    extra_maybenonpost[k] = pre_declarations[k].declaration.function(v)
TypeError: descriptor 'upper' for 'str' objects doesn't apply to a 'Faker' object

Can you explain why the call is passing a Faker object rather that the value generated by Faker?

F.build(name=factory.Faker("first_name_female", locale="hi"))  # crash

I would write:

fake = Faker(["hi"])
F.build(name=fake.first_name_female())

The original use case is for Django (or more generally ORMs) to deal with password hashing, see https://github.com/FactoryBoy/factory_boy/issues/366. That allows writing the cleartext password in the factory call: UserFactory.create(password="hunter2"), yet store the hashed version in the database. See https://factoryboy.readthedocs.io/en/latest/orms.html#factory.django.Password for more.

francoisfreitag avatar Sep 17 '22 15:09 francoisfreitag

I'd like to move this forward; here are a couple of additional considerations we should take into account ;)

Requirements

With the following code (mixing examples from @francoisfreitag and @n1ngu):

class UserFactory(factory.Factory):
    fullname = factory.Transformer(str.upper, "Joan Deer")
    password = factory.Transformer(make_password, "password123")

I believe we want the following options to be possible:

  • Providing a value generator (i.e another declaration) and having the transformation apply to it;
  • Forcing a specific value;
  • Reduce the risk of users "losing" track of the raw value that was provided to the transform (in @n1ngu's example, can we keep track of the pre-upper() name?).

Solving these would also address the topic discussed in #963.

Current behaviour

With the 3.2.1 code, the user can rely on assert MyFactory(field=value).field == value, except whe the factory performs some mangling through Factory._meta.rename.

In that situation, achieving the goals of Transformer would require:

class UserFactory(factory.Factory):
  class Meta:
    name_raw = factory.Faker("fullname")
  name = factory.LazyAttribute(lambda o: o.name_raw.upper())

The user would then use:

  • UserFactory(name_raw="value") to force a specific pre-transformation value;
  • UserFactory(name="value") to force a specific final value

Alternative implementations

Other options, for instance suggested in #963, would be to declare the transformation as part of the LazyAttribute or SelfAttribute:

class UserFactory(factory.Factory):
  class Meta:
    name_raw = factory.Faker("fullname")
 name = factory.SelfAttribute("name_raw", transform=str.upper)

However, that pattern would require the addition of a transform= keyword argument to all declarations, which would be cumbersome and might conflict with named arguments expected by other declarations (e.g. SubFactory).

Declarations with similar issues

The following declarations exhibit a similar issue to the proposed factory.Transformer:

factory.Trait

Once the class declaration of a factory using a Trait has been parsed by Python, the Trait is converted into a set of Maybe; callers cannot provide a new value for the trait afterwards, or "delete" the Maybe portion.

factory.PostGeneration

Those declarations have specific handling: if the caller provides a non-declaration value, it is provided to the function. If, instead, a factory post-generation declaration is provided, it overrides the declaration.

factory.RelatedFactory

Similar to factory.PostGeneration, a user can bypass a call to a related factory by providing a nake value to the factory: UserFactory(main_group=SomeGroup). If, instead, they provide another RelatedFactory declaration, that value would override the existing RelatedFactory generation: UserFactory(main_group=factory.RelatedFactory(LimitedGroupFactory))

Possible options

Special declaration

As proposed in #888, and similarly to the implementation for factory.PostGeneration, we could special-case a declaration:

>>> UserFactory(name="Joan Doer").name
"JOAN DOER"
>>> UserFactory(name=factory.Force("Joan Doer")).name
"Joan Doer"

Specific keyword argument

We could support an explicit keyword argument to force the raw value:

>>> UserFactory(name="Joan Doer").name
"JOAN DOER"
>>> UserFactory(name__="Joan Doer").name
"Joan Doer"

Discussion

The "special declaration" case requires adding a new kwarg, which is only useful for the factory.Transformer case.

  • It wouldn't make sense for factory.Trait — as the Trait declaration has already been parsed at class declaration time.
  • It wouldn't make sense for factory.PostGeneration, as those declarations are supposed to alter the instance, not provide a field.
  • It would provide a duplicated API for bypassing factory.RelatedFactory, where one could call either UserFactory(main_group=SomeGroup) or UserFactory(main_group=factory.Force(SomeGroup)) to bypass the related factory — although the latter might be clearer.

The "specific keyword" pattern is similar to the behaviour used by @post_generation and PostGenerationMethodCall:

  • The user can call UserFactory(main_group__=SomeGroup) to bypass a main_group = factory.RelatedFactory(...) declaration,;
  • They can call UserFactory(make_password__="hunter2").

Additional considerations

What happens with the following factories?

class UserFactory(factory.Factory):
  credentials = factory.Dict(
    password=factory.Transformer(make_password, "hunter2),
  )

class userFactory(factory.Factory):
  class Params:
    with_strong_password = factory.Trait(password="A Very Strong Password !!")
  password = factory.Transformer(make_password, "hunter2")

class TenantFactory(factory.Factory):
  name = factory.Faker("company_name")
  register_main_dns = factory.PostGenerationMethodCall("add_dns_alias")
  register_main_dns__ = factory.Transformer(slugify_dns, factory.SelfAttribute("name"))

Conclusion

I believe the best way forward is:

  • Fix the factory.Transformer declaration to be able to accept either a pre- or post- declaration, as Maybe and Trait do;
  • Enrich it, where a user can provide a forced value to name = factory.Transformer(...) with name__ = x

rbarrois avatar Sep 24 '22 11:09 rbarrois

Thanks for sharing the detailed analysis!

I agree that changing factory.Transformer to accept a pre- declaration would be an improvement, happy to give that a shot.

However, I am against the post- declaration part: the Transformer is meant to transform the value before to instantiate the object. This declaration was specifically introduced to deal with #366, which was a blocker for #316. My concern with a post-generation would be to keep encouraging the (bad) practice of saving the generated object twice with the CREATE strategy. If a post- declaration is allowed, an additional save is needed for the object in database to match the Python instance, going against #316. Unless a clear use case is presented, where Transformer would be a real gain, I’m convinced it’s better not to implement it.

Enrich it, where a user can provide a forced value to name = factory.Transformer(...) with name__ = x

This syntax is very unclear to me. It’s also not documented, not tested and does not appear a single time in the library repo. I would rather consider it an implementation detail of the library, and not start advertising it or even commit to supporting it. A GitHub search (although difficult because the = sign is ignored) does not yield usage either. Raw (or Force) could be declared inner to Transformer, so that users need to specify Transformer.Raw to wrap the value, clarifying its scope and preventing confusion for Trait or Maybe:

class Transformer:
    class Raw:
         # Raw implementation
    # Transformer implementation

What do you think?

francoisfreitag avatar Oct 21 '22 12:10 francoisfreitag

#1006 offers an update to handle pre-declarations with Transformer. Please feel free to review it and share your feedback!

francoisfreitag avatar Mar 18 '23 09:03 francoisfreitag

@francoisfreitag I like the approach in #1006, let's merge that!

Sorry for being away for so long 😕

rbarrois avatar Apr 05 '23 21:04 rbarrois

That’s great to hear, thanks! No worries, it’s all volunteer work, it always depends on what’s going on IRL :smile:

francoisfreitag avatar Apr 06 '23 07:04 francoisfreitag

It appears that the release is being blocked by this #886 however it's suggested that #1006 should be merged, https://github.com/FactoryBoy/factory_boy/pull/1006#issuecomment-1499322525

I would be happy to help fix it, however, I am currently confused as to which is the Solution to be merged?

PrenSJ2 avatar Jun 13 '23 16:06 PrenSJ2

The solution is #1006, augmented by #1013. I just updated #1006 to pull the improvements from #1013. If @rbarrois agrees, we should be integrating that PR soon.

francoisfreitag avatar Jun 13 '23 17:06 francoisfreitag