datafaker Junit 5 extension for injecting Faker object at test methods

Hi I used for some months this project and I saw that sometimes we need to have some boilerplate code for use it. Right now we must to create always an instace and setup manually the injection of the Faker object in our test cases like this

class ExampleTestClass {
    private final Faker faker = new Faker();
    //some tests
}

or

class ExampleTestClass {
    @Test
    void exampleTest(){
        var faker = new Faker();
        var givenIsbn = faker.code().isbn13(false);
        //test logic
    }
}

But this could be repetitive and make that the logic of the test a litte bit messy.

The Idea is to create a Junit5 extension that allows you to inject a shared faker instance in the test with something like this:

@ExtendWith(DataFakerExtension.class)
class ExampleTestClass {
    @Test
    void exampleTest(Faker faker){
        var givenIsbn = faker.code().isbn13(false);
        //test logic
   }
}

I think that this kind of extension would improve the legibility of the tests and the usability of the library.

Oct 25 '24 14:10 agomezlucena

@agomezlucena Why not just put this Faker instance to a static variable?

Oct 25 '24 21:10 asolntsev

I like the idea in theory, it's actually a long time on my list to have better Junit 5 integration. However, at the same time, I'm not sure if Junit should be included in Datafaker, since we like to keep the core as free from dependencies as we can, and it probably wouldn't be ideal to tie it a specific test engine.

Perhaps a module outside of the project would be more suitable at this moment? (datafaker-junit5 module or so, so that perhaps we can have a datafaker-testng or similar module in the future?)

Oct 25 '24 21:10 bodiam

Couldn't you just create a class that extends Faker and has a constant or singleton you can re-use instead of numerous instantiations?

Oct 25 '24 22:10 kingthorin

I don't think there's any reason why you couldn't just create a class level instance of Datafaker, and reuse that over the tests. It's a one line thing, I wouldn't use a faker per test case, and I'm not a fan of using Faker directly in a test class either, I use object mothers in 99% of the time.

But on the other hand, I also wouldn't mind having something like "DatafakerSource" similar to CsvSource etc for parametized tests. It can be done by creating a MethodSource, but I always have to look up the correct syntax. So I was hoping that maybe a junit5 project could be the start of some native integration, and we'll see where it takes us?

Oct 25 '24 23:10 bodiam

@agomezlucena Why not just put this Faker instance to a static variable?

@asolntsev mainly because with the extension we have a better experience of development, because is integrated with junit5 framework, and if the extension grow in complexity, complex things like csvmodel can be also added.

Perhaps a module outside of the project would be more suitable at this moment? (datafaker-junit5 module or so, so that perhaps we can have a datafaker-testng or similar module in the future?)

@bodiam I think, that is way just like mockito with junit.

I'm currently have a prototype, but I think is not good enough, because do just the parameter injection.

Whatever if you want I can create the PR.

Oct 26 '24 07:10 agomezlucena

mainly because with the extension we have a better experience of development, because is integrated with junit5 framework, and if the extension grow in complexity, complex things like csvmodel can be also added.

Sorry, but I don't understand this explanation. I still don't see any benefits in such extension.

If I understand your answer correctly, you want an extension that would be your project-specific, and its complexity would grow with your project. If so, the extension should be located in your project, not in datafaker.

Oct 26 '24 07:10 asolntsev

Whatever if you want I can create the PR.

You can make a PR, so we have something to discuss more indepth, but at this stage, it will probably unlikely that Junit will be a dependency of Datafaker. We have a section on that in the contributing guide here: https://github.com/datafaker-net/datafaker/blob/main/CONTRIBUTING.md#dependencies

However, if the PR is a good basis for further development, a special datafaker-junit layer could be good.

For Mocktio and Junit, things might be different. While Datafaker is mainly a library used in testing, some people use it also for production usage (I do that myself on a few projects). It wouldn't be beneficial to me to get a transitive dependency on JUnit in my production code just by introducing Datafaker, so that's something I'd like to avoid.

Oct 26 '24 10:10 bodiam

@bodiam The idea is not be attached to junit in datafaker dependency, it's to create a separated module for that. But for that we need to make a lot of changes, and I know that my PR must be denied, because has a lot of changes (I made the project a multimode one), but can be useful as inspiration, for future integrations with other testing frameworks.

This is the branch that have the changes: https://github.com/agomezlucena/datafaker/tree/create-junit-extension.

Oct 26 '24 14:10 agomezlucena

@kingthorin

Couldn't you just create a class that extends Faker and has a constant or singleton you can re-use instead of numerous instantiations?

Yes I can make it, but i will need to copy that class in every project / module or create a test fixture for that, and in general this can be standarized and automatized with a extension to junit.

And also this can evolve into inject directly providers or values using annotations (I currently have this kind of functionality in a personal project) and made the code much more cleaner, because you dont need to initialize the data just use an annotation like

@FakerBookCodeProvider(type=ISBN_13) String isbn

Oct 26 '24 15:10 agomezlucena

@agomezlucena

The PR is a bit overwhelming indeed with the rename. Wouldn't it be less change to create the project as a completely separate project, and have a dependency of Datafaker instead? We could publish it, or you foto's publish it under a different name, even something like io.github.datafakerjunit would do, as a sort of incubator? If we notice that there's a reasonable demand for it, we could consider integrating it into the core. Would that be something which could work?

Cause I like the idea, and I like that you built it!

Oct 26 '24 22:10 bodiam

Wouldn't it be less change to create the project as a completely separate project, and have a dependency of Datafaker instead?

Sure I can make it, the main issue that I have in mind is the publication, because if I publish the dependency by myself, nobody will use the extension, maybe if it's possible you could upload it, if you see that ok, I can mantain it in my free time, adding new functionalities like providers injection using annotations.

Oct 27 '24 10:10 agomezlucena

@agomezlucena I don't mind that. If you're okay with it, we can make another repo here, and publish it under the net.datafaker namespace, and add some references to it in the documentation. I'd propose to name it datafaker-junit, similar to some other repositories like cucumber: https://mvnrepository.com/artifact/io.cucumber/cucumber-junit/5.1.2.

Would that work? @kingthorin , @asolntsev what do you think? Want to give this a try?

Oct 27 '24 11:10 bodiam

Sounds good to me. We already have repos for other related things.

Oct 27 '24 12:10 kingthorin

@agomezlucena I don't mind that. If you're okay with it, we can make another repo here, and publish it under the net.datafaker namespace, and add some references to it in the documentation. I'd propose to name it datafaker-junit, similar to some other repositories like cucumber: https://mvnrepository.com/artifact/io.cucumber/cucumber-junit/5.1.2.

Would that work? @kingthorin , @asolntsev what do you think? Want to give this a try?

I have a feeling that nobody listens to me. :(

We don't need this extension.

It doesn't bring any additional value. It's just the same as holding Faker instance in a static variable.

KISS. Avoid unneeded comexity. YAGNI. Amen.

Oct 27 '24 12:10 asolntsev

@asolntsev my thinking here was that @agomezlucena put in the effort of creating this extension, so most likely it serves a need. I rather be convinced by code then by opinions. Perhaps this could be a starting point for adding the things I already described above.

However, maybe it's too early to put this in the Datafaker repo. So, @agomezlucena I think it would be better to put this in your own repo for now, and I'm happy to put a link in the documentation to your extension. If people start using it, there's probably a bigger need, and we'll integrate it into the Datafaker organisation if you still want that.

Oct 27 '24 13:10 bodiam

so most likely it serves a need

Sorry, but this is a very false assumption. :) People tend to make things more complex than needed. People tend to create "fancy" things instead of choosing a simple and stable solution. And responsibility of project maintaners is to build a barrier to avoid unneeded complexity.

@FakerBookCodeProvider(type=ISBN_13) String isbn Yes, this already brings some value. But let's create extension when this gets implemented. Not just an empty extension.

P.S. Though, I personally don't see any value in this annotation as well. This is equvalient to the old good faker.book().whatever(). Exactly one line of code. I don't see any reasons why converting a single line of code to an annotation brings any value. It only brings complexity.

Oct 28 '24 07:10 asolntsev

Sorry, but this is a very false assumption. :) People tend to make things more complex than needed. People tend to create "fancy" things instead of choosing a simple and stable solution. And responsibility of project maintaners is to build a barrier to avoid unneeded complexity.

Perhaps, and some people criticize, and some people create.

P.S. Though, I personally don't see any value in this annotation as well.

That's your right. Other people do. I don't really see the point of this discussion to be honest, if someone wants to build something, let them, and if you don't see the value in it, that's fine, but that shouldn't stop anyone. The whole point of this project is that contributions are welcome, and not that they are shot down before they've landed, so a bit more welcoming to ideas which don't per se align with yours would be appreciated.

PS: I personally still see a lot of value in the Annotation approach. I'm not a big fan of the Schemas, I find them too complex, and something like:

class Person {
   @FirstName
   private val firstName: String,
   @LastName
   private val lastName: String,
}

would be much more expressive and clear to me. And if you don't like it, I can live with that, I'll build it anyway one day, since it serves a need for me.

Oct 28 '24 11:10 bodiam

May be I'm late here anyway just to highlight that there is already existing functionality with population of java/kotlin objects based on predefined schema, also covered in doc https://www.datafaker.net/documentation/schemas/#populating-java-object-with-predefined-schema

it requires some time to get familiar with it.

At the same time the main downside of all those fancy annotations like

@FakeWhateverBasedOnProvider

is that

each providers requires its own annotation(s) and then maintenance
new providers will require new annotations
users of custom providers should create and maintain annotations for themselves

with embedded java object population feature it could satisfy all of them however it is not that user friendly may be could be done something about that

Oct 31 '24 23:10 snuyanzin

each providers requires its own annotation(s) and then maintenance
new providers will require new annotations
users of custom providers should create and maintain annotations for themselves

I was hoping we could generate them perhaps, based on the names of our current classes. I haven't done a POC yet, but that's what I had in mind. I wouldn't think it would be great to maintain them by hand.

Oct 31 '24 23:10 bodiam

yes, they can be generated and released for those which are in datafaker's repo the problem is that it will not cover providers created outside of datafaker repo.

We can create some instructions or even tools however it will not be fully automated

Oct 31 '24 23:10 snuyanzin

yes, they can be generated and released for those which are in datafaker's repo the problem is that it will not cover providers created outside of datafaker repo.

I can live with that. Perhaps there could be a different annotation or something for that (@CustomExpression("my.expression") or something like that.)

Oct 31 '24 23:10 bodiam

up to you

however as I mentioned there is already existing functionality which also allows to avoid that annotation hell (esp when amount of fields is > 5) and same schema (with expressions) could be reused for different objects

Oct 31 '24 23:10 snuyanzin

I'm not a fan of schemas. I find the syntax too complex, and I always have to look up how that works. As a result, I never use it. I think an annotation based approach would be simpler.

Why would the number of fields be an issue regarding annotations?

Oct 31 '24 23:10 bodiam

Why would the number of fields be an issue regarding annotations?

at least one annotation per field -> 2 times more code

usually after 5 fields it becomes noticeable especially if there are multiple tests with same objects which should be injected with data

Oct 31 '24 23:10 snuyanzin

usually after 5 fields it becomes noticeable

What becomes noticable? Sorry, I don't understand what the issue is here. I'm talking about an approach like:

class Person {
   @Firstname
   private String firstName

   @Lastname
   private String lastName

   // .. etc
}

Pretty common if you use JSON/Hibernate. Am I missing something here?

You also said:

at least one annotation per field -> 2 times more code

That would be the same for creating a Schema mapping, it's just moving the code from annotations to a schema. So, not really sure here what the issue is?

Nov 01 '24 02:11 bodiam

Let's consider more concrete case

class Person {
   private String firstName;
   private String lastName;
   private LocalDate dateOfBirth;
   private List<String> phoneNumbers;  // e.g. several phone numbers
}

As you mentioned it could be that Person is already enriched with json/hibernate/whatever annotations

The question would be: are we going to pollute it as well with our test annotations? There might be cases when we can not do this since Person could be defined in another dependency we are not owning and it might not allow to extend it.

going further: 2. by default en locale is used. What if in our case we allow Japanese symbols for name and want to test it or Korean? Based on current PR it is impossible.

Same question about repeatability and ability to provide seed.
dateOfBirth might be of different type: one prefers Date, others Timestamp somebody something else. Finally it means this kind of annotations would be hard to generate and need to maintain manually
How are we going to provide support for collections with annotations when several items should be emitted ? Also seems another pain point.

That would be the same for creating a Schema mapping, it's just moving the code from annotations to a schema.

yes however

now schema could be moved from the actual logic to some test utility class
we are able to have several schemas depending on conditions (for instance as above) and use each of them. for instance for some company employees should have phone number of country where headquarter is + another one for the country where they actually work. In this case there would be employees having just one phone number and employees having 2.
That means that we need to be able to generate different content of objects for these 2 cases. How can we make it with usage of annotations? With schemas we could have 2 schemas and use one or another

Nov 01 '24 07:11 snuyanzin

Pretty common if you use JSON/Hibernate.

Yes. And it's pretty common to call it "Annotation hell". :) And it's pretty common to avoid this hell in new projects, and use JdbcTemplate/JOOQ/Sql2o instead of Hibernate.

Nov 01 '24 11:11 asolntsev

for me the idea behind is to inject data in test method, if we need to create more complex data we just can simply create an annotation with a parameter ~~two class params, the first parameter is for casting the result of the second~~ that can be a factory of the data that contains the shared instace, with this we can create a object mother that contain certain amount entropy, because will generate random, valid values, based in whatever the user want. by instance. This is the user test

@ExtendWith(DatafakerJupiterExtension.class)
class PersonTest {
     @Test
     void shouldAllowToChangeTheName(
        @CustomProvider(generator=PersonGenerator.class) Person givenPerson,
    ) {
        var expectedName= "Jotaro";
        var previousName= givenPerson.getName();
        
        givenPerson.setName(expectedName);
        var obtainedName = givenPerson.getName();

        assertNotEquals(previousName, obtainedName);
        assertEquals(expectedName, obtainedName);
    }
}

This is framework code

@Target(ElementType.PARAMETER)
@Retention(RetentionPolicy.RUNTIME)
public @interface CustomProvider {
       Class<? extend CustomObjectGenerator> generator();
}

public abstract class CustomObjectGenerator<T> {
      protected Faker getFakerInstance(){
           return FakerSharedInstance.getInstance();
      }

     public abstract T newObject();    
}

This is user code

public class PersonGenerator extends CustomObjectGenerator<Person> {
     @Override
     public Person newObject(){
         var faker= getFakerInstace();

         return Person.builder()
              .name(faker.name().firstName())
              .lastName(faker.name().lastName())
              .phoneNumber(faker.phoneNumber().cellPhoneInternational())
              .build();
    }
}

Nov 01 '24 11:11 agomezlucena

based in whatever the user want.

I still didn't get it... just going through the bullets I mentioned in comment https://github.com/datafaker-net/datafaker/issues/1405#issuecomment-2451448289

the second bullet

by default en locale is used. What if in our case we allow Japanese symbols for name and want to test it or Korean?

Faker internally has knowledge about locale and seed which it should use to generate data in the code you posted faker is hidden from the user

how user can generate Japanese name if default locale (when faker create without arg is en)? And if there are 2 tests: for one we need Japanese for another Korean?

Nov 01 '24 11:11 snuyanzin

we can parameterize the locale if is required, and if present we set the locale to the parameterized one and if not we set it to english or system default.

Nov 01 '24 11:11 agomezlucena