faker icon indicating copy to clipboard operation
faker copied to clipboard

Proposal: test for duplicate local definition values

Open xDivisionByZerox opened this issue 3 years ago • 8 comments
trafficstars

Clear and concise description of the problem

We have a lot of duplicated values in our loves like: https://github.com/faker-js/faker/blob/7373a22f33f38d29ff53e4f4588f0137a35132b8/src/locales/en/address/city_name.ts#L6-L8

Suggested solution

Add a test or a CI pipeline that checks for duplications in locales.

Alternative

Dont.

Additional context

Overview for duplications in the en local

  • address.city_name: 73
  • address.city_suffix: 1
  • address.street_suffix: 30
  • address.time_zone: 18
  • animal.cow: 24
  • animal.fish: 1
  • animal.insect: 1
  • animal.snake: 5
  • lorem.words: 67
  • name.name: 1
  • phone_number.formats: 4
  • word.adjective: 8

xDivisionByZerox avatar May 20 '22 19:05 xDivisionByZerox

Would like to have some information if a test or pipeline would be the appropriate solution

xDivisionByZerox avatar May 20 '22 19:05 xDivisionByZerox

I would put it in a test. Maybe we can tweak the check in a way to only print offending entries. aka expect(offenders).toEqual([]) or toBeEmpty if it shows the offending entries.

ST-DDT avatar May 20 '22 22:05 ST-DDT

Should we also require sorted lists?

ST-DDT avatar May 20 '22 22:05 ST-DDT

I would put it in a test.

One test per module or one "big" test? I would prefer the second option but would like to hear your opinion.

Should we also require sorted lists?

Do you mean to require the local files to be sortet? Just wanna make sure, as I don't see the advantage of this.

xDivisionByZerox avatar May 21 '22 09:05 xDivisionByZerox

I would put it in a test.

One test per module or one "big" test? I would prefer the second option but would like to hear your opinion.

I think both would work well, because there will hardly be many of these failures at once.

Should we also require sorted lists?

Do you mean to require the local files to be sortet? Just wanna make sure, as I don't see the advantage of this.

Its probably too much of a hassle for now. I justed wanted to bring this to everyones attention, that this is also an option. (Makes spotting duplicates easier though)

ST-DDT avatar May 21 '22 09:05 ST-DDT

Additionally, we could modify the generate:locales script to filter and sort the files automatically.

ST-DDT avatar May 21 '22 09:05 ST-DDT

I like the idea of sorting and de-duplication Also that generate:locales will do that for us (if possible) is a great idea :+1:

I'm just a little bit afraid that it could affect the randomness in a badly way :thinking: But that could be just a wrong assumption :shrug:

Shinigami92 avatar May 21 '22 16:05 Shinigami92

Note that some locales contain duplicates (for some formats) to weight them specifically.

https://github.com/faker-js/faker/blob/4c0e41831f8d2fad92f85cea647cbd0873fd842e/src/locales/cz/name/name.ts#L2-L11

ST-DDT avatar May 21 '22 17:05 ST-DDT

Fixed in https://github.com/faker-js/faker/pull/1137

ST-DDT avatar Jan 19 '23 17:01 ST-DDT