datafaker icon indicating copy to clipboard operation
datafaker copied to clipboard

Data generation repeatability and Faker attribute access

Open jpeffer opened this issue 3 years ago • 22 comments

Is your feature request related to a problem? Please describe. In order to support data generation repeatability, it is important to be able to easily set the Random seed. Currently, the Faker constructor signatures make this a difficult task when a Faker uses another Faker internally.

Faker A should be able to easily pass along its Random when constructing Faker B. While you can get access to the RandomService, you cannot easily obtain the Random object within it. Using the RandomService to construct Faker B then requires other parameters to do so.

A similar problem exists with Locales last I recall, where it is difficult to access the current configuration of a given Faker.

Describe the solution you'd like Attributes like Random and Locale should be accessible after a given Fake is constructed. This would improve the ability to utilize Faker's internally within other fakers, and help to improve logging capabilities where it might be important to trace some of these elements for test repeatability.

jpeffer avatar May 28 '22 14:05 jpeffer

It would be great if you provide a reproducible test case for that

snuyanzin avatar May 28 '22 15:05 snuyanzin

Also, what do you mean with Faker A and Faker B? Are you talking about Faker, or about a data provider like Address or Name?

bodiam avatar May 28 '22 15:05 bodiam

Also, what do you mean with Faker A and Faker B? Are you talking about Faker, or about a data provider like Address or Name?

Sorry, I don't have access to the example I was playing around with at the moment, so I will need to get back to you with those details.

The use case at a high level is that I have data providers that are being used to create a fluent API for test data generation. Due to how Locales get set, an English Faker may have the need to internal generate non-English data.

Consider a Company / Organization that may be a subsidiary of a foreign company or it may itself own a foreign company. In this case, I need to generate a Company with both US and non-US postal addresses. To support this, my data provider is creating a new foreign locale Faker to generate foreign addresses within a different locale, in order to provide a mix of locales and support the fluent API design. Additionally, I want to be able to pass along either the locale and / or seed currently in use to the internal Faker that was needed.

I can provide a more concrete example later when I have a moment.

jpeffer avatar May 28 '22 16:05 jpeffer

Wouldn't it make sense to make your own config class then? FakerConfig, and using that config you create your provider? Then you can pass around that config to each of your fakers.

bodiam avatar May 28 '22 22:05 bodiam

@jpeffer is this still an issue? If not, I'll close the issue.

bodiam avatar Jun 09 '22 10:06 bodiam

@bodiam Yes, my apologies for not getting back to you, I've just been swamped lately. I was planning to provide some examples.

I think the primary question is why would the Faker API not provide the ability to, after construction, be able to obtain the seed in use or the locale? While I could circumvent the current API with a FakerConfig as you suggested, I would view that as working around the library.

jpeffer avatar Jun 09 '22 13:06 jpeffer

How can the Faker know the seed? It's (potentially) instantiated with a Random object, which doesn't expose the seed. So, if you'd do:

val faker = Faker(Random(1234)), there's no way for the faker to give you back 1234. The only one knowing the seed is the caller, which is you.

Also if you'd do:

val faker = Faker(), and image there would be a getSeed() method, what would it return? A null? Throw an exception?

Having said that, if this is important to you, we're not against having a feature like this.

Regarding the Locale, there is already a method for it which is not public. I'm not entirely sure why, but I have exposed it for you. Besides that, you can get access to the Random used internally by using faker.random().randomInternal(), which you can use to get access to the Random object being used, which you can use to generate another Faker if you need to.

bodiam avatar Jun 09 '22 14:06 bodiam

Correct, the initialization of a Faker with a provided Random is problematic. I think that opens the question though of whether you need to be able to provide a Random. If there is no Faker constructor accepting Random and one that instead accepts a seed, the problem goes away right?

jpeffer avatar Jun 09 '22 15:06 jpeffer

You don't need to provide a random, you can also instantiate a Faker without one, but that doesn't address your problem I guess.

Send us a PR or some code with what you want, and we can discuss, that makes things a bit easier.

bodiam avatar Jun 09 '22 23:06 bodiam

Regarding

Consider a Company / Organization that may be a subsidiary of a foreign company or it may itself own a foreign company. In this case, I need to generate a Company with both US and non-US postal addresses. To support this, my data provider is creating a new foreign locale Faker to generate foreign addresses within a different locale, in order to provide a mix of locales and support the fluent API design. Additionally, I want to be able to pass along either the locale and / or seed currently in use to the internal Faker that was needed.

you might be interested in having a look at a related PR https://github.com/datafaker-net/datafaker/pull/214

snuyanzin avatar Jun 11 '22 09:06 snuyanzin

you might be interested in having a look at a related PR #214

Saw this today and love it. Thanks for the change.

jpeffer avatar Jun 11 '22 16:06 jpeffer

@bodiam I have opened PR https://github.com/datafaker-net/datafaker/pull/215 to support further discussion. Let me know if you have any questions. I wasn't too sure if you would be open to a breaking change, but I felt this was a cleaner approach that at least warranted a discussion. There are alternatives if you feel strongly about retaining Random as a constructor param.

jpeffer avatar Jun 11 '22 16:06 jpeffer

@jpeffer I'm not sure keen on breaking API changes, however, if this is a change which works for you, and nobody else is using the faker with Random API, then I think the fact that you did the effort to make the change speaks in favor of having this change. Any chance there a way to keep both constructors?

bodiam avatar Jun 12 '22 01:06 bodiam

Certainly there are a few options that come to mind. I shared my thoughts in the PR. Let me know what you think. The first iteration was just to help the discussion.

jpeffer avatar Jun 12 '22 15:06 jpeffer

I was working on some refactoring to address your concerns and got a little confused about the responsibility of Faker and FakeValuesService. Many of the methods in the FakeValuesService are exposed directly in Faker, but the fakeValueService getter is restricted.

Are there methods in FakeValueService that you do not want to expose? We could create an interface that exposes only the methods we want publicly accessible and then return that class from Faker.

Something like FakeValueService extends FakeValueServiceProvider, where FakeValueServiceProvider defines methods we're ok making public.

Then, within Faker, we could have public FakeValueServiceProvider fakeValueService() . Then all of the providers could call faker.fakeValueService.whatever().

Just wanted to bring it up so I understood the intent of the FakeValueService, so I could incorporate it into the refactoring effort. The primary goal will still be to provide an unchanged interface to avoid breaking changes, while introducing some of the changes I have mentioned before.

jpeffer avatar Jun 14 '22 01:06 jpeffer

@jpeffer Thanks for looking at this. I have no issues with exposing things to the outside. I've worked for a while on the Fitnesse codebase, and it often annoyed me that methods which I needed where private or so, so I'd say: open it all.

I have more issues with removing methods, since that could break things, and yes, the more we expose, the higher the chance of breaking the API, but if it helps you to expose it, do it, no problem at all.

epragtbeamtree avatar Jun 14 '22 06:06 epragtbeamtree

@epragtbeamtree, @bodiam I've opened a new PR that aims to address not breaking the existing API, but lays a foundation for decoupling the data providers from the single Faker instance. I also believe I discovered a bug in the FakeValuesService (or the data provider configuration files, depending on your perspective), which was causing the FakerIT tests to fail on master.

Happy to talk over the approach and what future enhancements might be desired. Take a look and let me know what you think:

https://github.com/datafaker-net/datafaker/pull/221

jpeffer avatar Jun 20 '22 01:06 jpeffer

I want to be able to pass along either the locale and / or seed currently in use to the internal Faker that was needed.

I've submitted a PR which now allows to pass locale and/or seed to faker. However it will not give you current seed

https://github.com/datafaker-net/datafaker/pull/222

snuyanzin avatar Jun 23 '22 22:06 snuyanzin

Are there concerns with the PR I put together in PR #221? I believe the refactor introduces a number of desirable changes to the structure and extensibility of the code.

jpeffer avatar Jun 24 '22 03:06 jpeffer

currently no concerns, however this change (#222 ) was pretty straightforward and could be used for old api. At the same time it does not introduce as many new features as a new api. Only one.

snuyanzin avatar Jun 24 '22 05:06 snuyanzin

Unfortunately #222 does not address the original need entirely. I do however like the idea of passing in a seed to doWith, but I still believe setting an immutable seed, which can also be retrieved, which is useful for logging or other purposes, is more flexible. I'd be happy to the doWith capability into my PR.

jpeffer avatar Jun 30 '22 03:06 jpeffer

your contributions are welcome

snuyanzin avatar Jul 23 '22 05:07 snuyanzin

Hi @jpeffer , I believe some of the issues mentioned here have been addressed, but I think to keep the conversation focused, it's probably better to open a new issue if there are any concerns or improvements you'd like to see.

bodiam avatar Sep 25 '22 07:09 bodiam