Make the name list collator benchmarks more realistic

Open hsivonen opened this issue 8 months ago • 0 comments

Currently, the name list workloads for at least Chinese and Korean look realistic enough in the sense that there are multiple names that start with the same Han character or Hangul syllable.

However, at least the Latin and Russian name lists are not realistic in the sense that the family names seem all distinct.

If the purpose is to try to represent a person's contact list, chances are that a person's non-professional contact list includes people whose relation to each other is such that they have the same family name. Even if the purpose is to represent some kind of list other than a person's non-professional contact list, chances are that even some other people directory ends up with some duplicate family names.

I suggest that we take at least two or three family names per list and make them occur two times.

Apr 30 '25 07:04 hsivonen