faker
faker copied to clipboard
Proposal: test for duplicate local definition values
Clear and concise description of the problem
We have a lot of duplicated values in our loves like: https://github.com/faker-js/faker/blob/7373a22f33f38d29ff53e4f4588f0137a35132b8/src/locales/en/address/city_name.ts#L6-L8
Suggested solution
Add a test or a CI pipeline that checks for duplications in locales.
Alternative
Dont.
Additional context
Overview for duplications in the en local
- address.city_name: 73
- address.city_suffix: 1
- address.street_suffix: 30
- address.time_zone: 18
- animal.cow: 24
- animal.fish: 1
- animal.insect: 1
- animal.snake: 5
- lorem.words: 67
- name.name: 1
- phone_number.formats: 4
- word.adjective: 8
Would like to have some information if a test or pipeline would be the appropriate solution
I would put it in a test.
Maybe we can tweak the check in a way to only print offending entries.
aka
expect(offenders).toEqual([]) or toBeEmpty if it shows the offending entries.
Should we also require sorted lists?
I would put it in a test.
One test per module or one "big" test? I would prefer the second option but would like to hear your opinion.
Should we also require sorted lists?
Do you mean to require the local files to be sortet? Just wanna make sure, as I don't see the advantage of this.
I would put it in a test.
One test per module or one "big" test? I would prefer the second option but would like to hear your opinion.
I think both would work well, because there will hardly be many of these failures at once.
Should we also require sorted lists?
Do you mean to require the local files to be sortet? Just wanna make sure, as I don't see the advantage of this.
Its probably too much of a hassle for now. I justed wanted to bring this to everyones attention, that this is also an option. (Makes spotting duplicates easier though)
Additionally, we could modify the generate:locales script to filter and sort the files automatically.
I like the idea of sorting and de-duplication
Also that generate:locales will do that for us (if possible) is a great idea :+1:
I'm just a little bit afraid that it could affect the randomness in a badly way :thinking: But that could be just a wrong assumption :shrug:
Note that some locales contain duplicates (for some formats) to weight them specifically.
https://github.com/faker-js/faker/blob/4c0e41831f8d2fad92f85cea647cbd0873fd842e/src/locales/cz/name/name.ts#L2-L11
Fixed in https://github.com/faker-js/faker/pull/1137