GenFu icon indicating copy to clipboard operation
GenFu copied to clipboard

Distinct

Open M-Zuber opened this issue 9 years ago • 14 comments

It would nice to be able to tell GenFu to give me a list with a certain property being distinct for the whole collection. Something along the following:

class Foo
{
int ID { get; set;}
string Name { get; set; } // Is unique in the datastore
}

A.Configure<Foo>()
    .Fill(f => f.Name)
    .Distinct();

M-Zuber avatar Dec 21 '15 14:12 M-Zuber

Agreed this would be an extremely useful feature

dpaquette avatar Dec 21 '15 18:12 dpaquette

I have tried working with MoreLinq in the meantime, but I am still getting exceptions in my integration tests. You guys have a gitter/slack somewhere I can chat for help?

M-Zuber avatar Dec 21 '15 18:12 M-Zuber

Nothing yet but we are working on getting something setup

dpaquette avatar Dec 21 '15 19:12 dpaquette

I kind of wonder if maybe the default behaviour should be to generate without duplicates and have an override to force a duplicate to appear. In most cases I would think not having duplicates would be desirable or at very least not harmful.

stimms avatar Dec 25 '15 03:12 stimms

A few things to consider if we do it by default:

  1. Performance: Could be slow for large collections
  2. What do we do if someone asks for 10,000 items and we only have 1000 unique names in our database
  3. What should be unique? Consider the usual Person example. FirstName and LastName individual do not need to be unique but maybe should be unique when combined. Even FirstName + LastName should not be unique in any large data set.

dpaquette avatar Dec 26 '15 21:12 dpaquette

Huh, those are good things to consider.

  1. We might be able to figure out some sort of a solution with a hashtable for constant time lookups but it would be difficult for entities which don't implement icomparable.
  2. Throw an exceptions would be the most sensible thing. NotEnoughJunkToFillTheRequestedTrunkException
  3. I was thinking just for individual fields but multiple fields does make more sense.

stimms avatar Dec 26 '15 23:12 stimms

If we are to throw exceptions I think that turning on distinct should be an option (not a default value/setting). This can be done from defaults or fill. Otherwise people's existing code base will start throwing exceptions.

Regards,

Garry Taylor

On 26 Dec 2015, at 23:18, Simon Timms <[email protected]mailto:[email protected]> wrote:

Huh, those are good things to consider.

  1. We might be able to figure out some sort of a solution with a hashtable for constant time lookups but it would be difficult for entities which don't implement icomparable.
  2. Throw an exceptions would be the most sensible thing. NotEnoughJunkToFillTheRequestedTrunkException
  3. I was thinking just for individual fields but multiple fields does make more sense.

— Reply to this email directly or view it on GitHubhttps://github.com/MisterJames/GenFu/issues/50#issuecomment-167371034.

gpltaylor avatar Dec 27 '15 12:12 gpltaylor

@M-Zuber and @gpltaylor: What about something like the following?

GenFu.Configure<Person>()
    .UniqueBy(p => p.Firstname)
    .UniqueBy(p => p.Lastname);
var people = A.UniqueList<Person>(25);

This would let you build up the list of properties that you want to, in unison, represent a unique entity. The above, for example, would keep a list of hashes on Firstname and Lastname and throw out duplicates during generation.

If you hadn't set up configuration for the Person object, we'd have to resort to using the entire property set to generate a hash...so, we'd be adding a perf hit here.

cc/ @dpaquette @stimms

MisterJames avatar Jan 18 '16 01:01 MisterJames

That looks perfect for me. Ensures that it is very clear in the setup what needs to be unique, and allows for a simple flow of different uniqueness for the same model in different scenarios.

M-Zuber avatar Jan 18 '16 04:01 M-Zuber

This looks good @MisterJames . If we configure the UniqueBy properties do we need a separate UniqueList<> method? Could we just do A.List<Person>(25);

dpaquette avatar Jan 18 '16 17:01 dpaquette

@MisterJames Looks good!

bastienJS avatar Jan 22 '16 12:01 bastienJS

How is this feature going? Is already available? What about the following:

A.Configure<Message>()
   .Fill(x => x.Text)
   .WithRandom(new String[] { "Hello", "How are you?", "Bye" })     
   .Distinct();

var messages = A.ListOf<Message>(4000);

If using a cross product between the String array and an Int array the result would be:

"Hello 1"
"How are you? 1"
"Bye 1"
"Hello 2"
"How are you? 2"
"Bye 2"
...

BTW, I am doing this manually as follows:

     new String[] { 
        "Hello", "How are you?", "Bye" 
     }.SelectMany(x => Enumerable.Range(1, 200).Select(y => $"{x} {y}"));

This provides 600 unique messages ... Of course the ideal would be to have 600 messages but for testing it is not feasible and GenFu can have the most common lists but not all ...

If adding only the internal index the result would be:

"Hello 1"
"How are you? 2"
"Bye 3"
"Hello 4"
"How are you? 5"
"Bye 6"
...

It probably works in all situations where uniqueness is required and there is no limit for size ...

It could also be used something like:

.Distinct(x => $"{x.Text} - {index}"); 

Where index would be 1, 2, 3, 4, ...

This is something really usefull as often the datastore requires uniqueness.

What do you think?

mdmoura avatar Feb 29 '16 14:02 mdmoura

We're in 2019....... and still missing this feature... Is this package dead? Common.... it's impossible that nobody feel that this is not important......

petersontubini avatar Oct 31 '19 16:10 petersontubini

Looks good!

wangengzheng avatar Apr 28 '21 05:04 wangengzheng