faker
faker copied to clipboard
test(date): Add Intl-Based Tests for Date Definitions to Ensure Completeness of Weekdays and Months
#2904 This PR is about Intl-based tests for the date definitions in Faker.js. These tests ensure that the weekdays and months are correctly defined and returned across different locales, enhancing the accuracy and completeness of the data.
Deploy Preview for fakerjs ready!
| Name | Link |
|---|---|
| Latest commit | 78c7183a1fbcb70a9210b5fcd3c712016910f513 |
| Latest deploy log | https://app.netlify.com/sites/fakerjs/deploys/667202f079e1040008bd04f0 |
| Deploy Preview | https://deploy-preview-2912.fakerjs.dev |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.95%. Comparing base (
80d5aad) to head (78c7183). Report is 194 commits behind head on next.
Additional details and impacted files
@@ Coverage Diff @@
## next #2912 +/- ##
==========================================
- Coverage 99.95% 99.95% -0.01%
==========================================
Files 2987 3051 +64
Lines 216052 217787 +1735
Branches 603 948 +345
==========================================
+ Hits 215952 217686 +1734
- Misses 100 101 +1
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/locales/af_ZA/date/index.ts | 100.00% <100.00%> (ø) |
|
| src/locales/af_ZA/date/month.ts | 100.00% <100.00%> (ø) |
|
| src/locales/af_ZA/date/weekday.ts | 100.00% <100.00%> (ø) |
|
| src/locales/af_ZA/index.ts | 100.00% <100.00%> (ø) |
|
| src/locales/ar/date/month.ts | 100.00% <100.00%> (ø) |
|
| src/locales/ar/date/weekday.ts | 100.00% <100.00%> (ø) |
|
| src/locales/az/date/month.ts | 100.00% <100.00%> (ø) |
|
| src/locales/az/date/weekday.ts | 100.00% <100.00%> (ø) |
|
| src/locales/cs_CZ/date/month.ts | 100.00% <100.00%> (ø) |
|
| src/locales/cs_CZ/date/weekday.ts | 100.00% <100.00%> (ø) |
|
| ... and 143 more |
Is this PR ready for review yet?
Because I dont see Intl in the tests anywhere.
@ST-DDT If I create tests with JavaScript Intl.DateTimeForamt, most of them will not pass. Did I proceed correctly??
it('should use Intl.DateTimeFormat to get the month name in the correct locale', () => {
const date = new Date(2020, 0, 1);
const intlMonth = new Intl.DateTimeFormat(locale, {
month: 'long',
}).format(date);
expect(localizedFaker.definitions.date.month.wide).toContain(
intlMonth
);
});
it('should use Intl.DateTimeFormat to get the abbreviated month name in the correct locale', () => {
const date = new Date(2020, 0, 1);
const intlMonth = new Intl.DateTimeFormat(locale, {
month: 'short',
}).format(date);
expect(localizedFaker.definitions.date.month.abbr).toContain(
intlMonth
);
});
I'm currently on mobile and thus cannot run the tests myself. The tests look good to me (although all 12 months need to be checked). ~~Also I'm not sure whether the current setting returns only the month or the full date string, for that I need some of the failure messages.~~
@ST-DDT These are some failure messages.
AssertionError: expected [ 'Luni', 'Marți', 'Miercuri', 'Joi', 'Vineri', 'Sâmbătă', 'Duminică' ] to include 'sâmbătă' AssertionError: expected [ 'нед', 'пон', 'вто', 'сре', 'чет', 'пет', 'саб' ] to include 'саб.' AssertionError: expected [ 'Hé', 'Ke', 'Sze', 'Csüt', 'Pé', 'Szo', 'Va' ] to include 'V'
Failure factors can range from missing data to capitalization, dots, etc.
describe.each(Object.entries(allLocales))(
'for locale %s',
(locale, fakerLocale) => {
let localizedFaker: Faker;
const weekdays = Array.from(
{ length: 7 },
(_, i) => new Date(2020, 0, i + 4)
); // January 4-10, 2020 are Sunday to Saturday
beforeAll(() => {
localizedFaker = new Faker({ locale: [fakerLocale, en, base] });
});
it(`should return random value from date.weekday.wide_context array for context option in ${locale}`, () => {
if (localizedFaker.definitions.date.month.wide_context) {
const weekday = localizedFaker.date.weekday({ context: true });
expect(
localizedFaker.definitions.date.weekday.wide_context
).toContain(weekday);
}
});
it(`should return random value from date.weekday.abbr_context array for abbreviated and context option in ${locale}`, () => {
if (localizedFaker.definitions.date.month.abbr_context) {
const weekday = localizedFaker.date.weekday({
abbreviated: true,
context: true,
});
expect(
localizedFaker.definitions.date.weekday.abbr_context
).toContain(weekday);
}
});
it('should use Intl.DateTimeFormat to get the weekday name in the correct locale', () => {
for (const date of weekdays) {
const intlWeekday = new Intl.DateTimeFormat(locale, {
weekday: 'long',
}).format(date);
expect(localizedFaker.definitions.date.weekday.wide).toContain(
intlWeekday
);
}
});
it('should use Intl.DateTimeFormat to get the abbreviated weekday name in the correct locale', () => {
for (const date of weekdays) {
const intlWeekday = new Intl.DateTimeFormat(locale, {
weekday: 'short',
}).format(date);
expect(localizedFaker.definitions.date.weekday.abbr).toContain(
intlWeekday
);
}
});
}
);
This is my current test code.
Please note: We want the test to ensure that we have the correct values, currently the values might be invalid.
AssertionError: expected [ 'Luni', 'Marți', 'Miercuri', 'Joi', 'Vineri', 'Sâmbătă', 'Duminică' ] to include 'sâmbătă'
If I insert sâmbătă or Sâmbătă into google translate and translate it back and forth, then I get the lowercase variant.
Could you please fix the invalid values and replace it with correct ones?
Maybe using a onetime script like:
for [locale] in allLocales {
const weekdayPath = resolve("src/locales/${locale}/date/weekday.ts");
const storedWeekdays = await import(weekdayPath) as DateDefinition["weekday"];
const long = [];
const abbr = [];
for (const i=0;i<7;i++) {
long.push(new Intl...);
abbr.push(new Intl....);
}
storedWeekdays.long = long;
storedWeekdays.abbr = abbr;
writeFileSync(weekdayPath, "export default "+JSON.toString(storedWeekdays));
}
And run pnpm run preflight afterwards.
EDIT: Please post the script here, so that we can use it during the review.
Please merge next into your branch to avoid merge conflicts, because we recently merged #2902.
- #2902
EDIT: Please post the script here, so that we can use it during the review.
If you post the script here, we can later integrate it in our generate locale script, so that all locales always have the correct values.
Or if you want to do that directly you can add it after generateLocaleFile:
https://github.com/faker-js/faker/blob/7dc8a18f0b906f959670627240a6e5600bf47b75/scripts/generate-locales.ts#L414-L422
@ST-DDT When I run the script that creates the files in the date folder, all tests now pass. However, there are a few things to check. The data previously entered for each locale is also If Intl returns English, it will be converted to English. Also, the 'vd' code was returned by Intl in my local language, so I added an exception condition.
I am a novice developer, so please understand my embarrassing generator code.
Rather than just overwriting the existing data, could we output a table showing where the data produced by Intl differs from what we already have in Faker? That would make it easier to see discrepancies, and loop in native speakers if needed to see why.
@matthewmayer That seems like a good idea. Can someone please help?? I have other things to do, so it might take some time.
Here are the discrepancies i could find, for "wide" data (the abbr/short data has more variation), ignoring case-only differences and ignoring data where Intl doesnt seem to provide correct data eg dv and sr_RS_latin.
| data | type | locale | Faker | Intl |
|---|---|---|---|---|
| weekday | wide | fa | پتچشنبه,جمعه,چهارشنبه,دوشنبه,سه شنبه,شنبه,یکشنبه | پنجشنبه,جمعه,چهارشنبه,دوشنبه,سهشنبه,شنبه,یکشنبه |
| weekday | wide | he | יום חמישי,יום ראשון,יום רביעי,יום שישי,יום שלישי,יום שני,שבת | יום חמישי,יום ראשון,יום רביעי,יום שבת,יום שישי,יום שלישי,יום שני |
| weekday | wide | pt_BR | Domingo,Quarta,Quinta,Sábado,Segunda,Sexta,Terça | domingo,quarta-feira,quinta-feira,sábado,segunda-feira,sexta-feira,terça-feira |
| weekday | wide | pt_PT | Domingo,Quarta,Quinta,Sábado,Segunda,Sexta,Terça | domingo,quarta-feira,quinta-feira,sábado,segunda-feira,sexta-feira,terça-feira |
| weekday | wide | ur | اتور,بدھ,پیر,جمعرات,جمعہ,منگل,ہفتہ | اتوار,بدھ,پیر,جمعرات,جمعہ,منگل,ہفتہ |
| weekday | wide | zh_CN | 星期二,星期六,星期三,星期四,星期天,星期五,星期一 | 星期二,星期六,星期日,星期三,星期四,星期五,星期一 |
| weekday | wide | zh_TW | 星期一,星期二,星期三,星期五,星期六,星期天,星期四 | 星期一,星期二,星期三,星期五,星期六,星期日,星期四 |
| month | wide | ar | آب,آذَار,أَيَّار,أَيْلُول,تِشْرِين ٱلْأَوَّل,تِشْرِين ٱلثَّانِي,تَمُّوز,حَزِيرَان,شُبَاط,كَانُون ٱلْأَوَّل,كَانُون ٱلثَّانِي,نَيْسَان | أبريل,أغسطس,أكتوبر,ديسمبر,سبتمبر,فبراير,مارس,مايو,نوفمبر,يناير,يوليو,يونيو |
| month | wide | ur | اپریل,اکتوبر,اگست,جنوری,جولائ,جون,دسمبر,ستمبر,فروری,مارچ,مئ,نومبر | اپریل,اکتوبر,اگست,جنوری,جولائی,جون,دسمبر,ستمبر,فروری,مارچ,مئی,نومبر |
| month | wide | uz_UZ_latin | Aprel,Avgust,Dekabr,Fevral,Iyul,Iyun,Mart,May,Noyabr,Oktyabr,Sentyabr,Yanvar | Aprel,Avgust,Dekabr,Fevral,Iyul,Iyun,Mart,May,Noyabr,Oktabr,Sentabr,Yanvar |
| month | wide | vi | Tháng Ba,Tháng Bảy,Tháng Chín,Tháng Giêng,Tháng Hai,Tháng Mười,Tháng Mười Hai,Tháng Mười Một,Tháng Năm,Tháng Sáu,Tháng Tám,Tháng Tư | Tháng 1,Tháng 10,Tháng 11,Tháng 12,Tháng 2,Tháng 3,Tháng 4,Tháng 5,Tháng 6,Tháng 7,Tháng 8,Tháng 9 |
i think you really need a language speaker to check these manually. For example, in Chinese which I do know, both 星期天 and 星期日 are both perfectly valid words for "Sunday".
In Vietnamese it looks like we are spelling out the month numbers versus using digits.
In Uzbek there seems to be some debate about the correct transliteration for September and October - see https://daryo.uz/2018/01/15/sentabr-yoziladimi-yoki-sentyabr , perhaps @Mirazyzz can help clarify!
Here are the discrepancies i could find, for "wide" data (the abbr/short data has more variation), ignoring case-only differences and ignoring data where Intl doesnt seem to provide correct data eg dv and sr_RS_latin.
data type locale Faker Intl weekday wide fa پتچشنبه,جمعه,چهارشنبه,دوشنبه,سه شنبه,شنبه,یکشنبه پنجشنبه,جمعه,چهارشنبه,دوشنبه,سهشنبه,شنبه,یکشنبه weekday wide he יום חמישי,יום ראשון,יום רביעי,יום שישי,יום שלישי,יום שני,שבת יום חמישי,יום ראשון,יום רביעי,יום שבת,יום שישי,יום שלישי,יום שני weekday wide pt_BR Domingo,Quarta,Quinta,Sábado,Segunda,Sexta,Terça domingo,quarta-feira,quinta-feira,sábado,segunda-feira,sexta-feira,terça-feira weekday wide pt_PT Domingo,Quarta,Quinta,Sábado,Segunda,Sexta,Terça domingo,quarta-feira,quinta-feira,sábado,segunda-feira,sexta-feira,terça-feira weekday wide ur اتور,بدھ,پیر,جمعرات,جمعہ,منگل,ہفتہ اتوار,بدھ,پیر,جمعرات,جمعہ,منگل,ہفتہ weekday wide zh_CN 星期二,星期六,星期三,星期四,星期天,星期五,星期一 星期二,星期六,星期日,星期三,星期四,星期五,星期一 weekday wide zh_TW 星期一,星期二,星期三,星期五,星期六,星期天,星期四 星期一,星期二,星期三,星期五,星期六,星期日,星期四 month wide ar آب,آذَار,أَيَّار,أَيْلُول,تِشْرِين ٱلْأَوَّل,تِشْرِين ٱلثَّانِي,تَمُّوز,حَزِيرَان,شُبَاط,كَانُون ٱلْأَوَّل,كَانُون ٱلثَّانِي,نَيْسَان أبريل,أغسطس,أكتوبر,ديسمبر,سبتمبر,فبراير,مارس,مايو,نوفمبر,يناير,يوليو,يونيو month wide ur اپریل,اکتوبر,اگست,جنوری,جولائ,جون,دسمبر,ستمبر,فروری,مارچ,مئ,نومبر اپریل,اکتوبر,اگست,جنوری,جولائی,جون,دسمبر,ستمبر,فروری,مارچ,مئی,نومبر month wide uz_UZ_latin Aprel,Avgust,Dekabr,Fevral,Iyul,Iyun,Mart,May,Noyabr,Oktyabr,Sentyabr,Yanvar Aprel,Avgust,Dekabr,Fevral,Iyul,Iyun,Mart,May,Noyabr,Oktabr,Sentabr,Yanvar month wide vi Tháng Ba,Tháng Bảy,Tháng Chín,Tháng Giêng,Tháng Hai,Tháng Mười,Tháng Mười Hai,Tháng Mười Một,Tháng Năm,Tháng Sáu,Tháng Tám,Tháng Tư Tháng 1,Tháng 10,Tháng 11,Tháng 12,Tháng 2,Tháng 3,Tháng 4,Tháng 5,Tháng 6,Tháng 7,Tháng 8,Tháng 9 i think you really need a language speaker to check these manually. For example, in Chinese which I do know, both 星期天 and 星期日 are both perfectly valid words for "Sunday".
In Vietnamese it looks like we are spelling out the month numbers versus using digits.
In Uzbek there seems to be some debate about the correct transliteration for September and October - see https://daryo.uz/2018/01/15/sentabr-yoziladimi-yoki-sentyabr , perhaps @Mirazyzz can help clarify!
Month names are derived from Russian, and therefore IMO the proper usage would be "sentYAbr" to reflect the Russian "сентЯбрь." Since there is no direct equivalent of the letter "Я" in our language, we use the combination of "Y" and "A." However, the debate on this matter is still ongoing, and no official approach has been decided yet. Interestingly, some official school books use both variations on different pages.
Given this uncertainty, I suggest we choose the variation that minimizes our workload and simplifies our process.
@ST-DDT I have updated the code you reviewed. Thank you very much for your kind and detailed review. Thanks to you, I am learning a lot.
@ST-DDT I have updated it to reflect the parts you reviewed. ynnsuis is my company account.
I reduced the diff a bit, fixed an issue with the date/index.ts generation and generated the missing files for preview.
Sorry, for directly interfering with your PR.
- Creates unnecessary definitions for all descendant locales, e.g. en_AU, en_GB, en_HK when these should probably just inherit from en.
- Creates definitions for base locale, which it shouldn't
Currently, it does not because Intl uses de-de and we use de_DE.
This might have to be changed and then filtered for redundancy though.
@sossost Could you please fix that? (Please pull before changing any code)
- Doesn't make seperate arrays for wide and wide_context or abbr and abbr_context in locales where these are different e.g.
frorru- see documentation of "context" parameter at fakerjs.dev/api/date#month
I'm not sure whether I get you correctly here. The context variants arent touched or generated because AFAIK there is no Intl way to do that.
@ST-DDT No! thank you Actually, this is a bit of a difficult issue for me because I still lack a lot. Thank you very much for your direct and indirect help.
- Creates unnecessary definitions for all descendant locales, e.g. en_AU, en_GB, en_HK when these should probably just inherit from en.
- Creates definitions for base locale, which it shouldn't
Currently, it does not because
Intlusesde-deand we usede_DE. This might have to be changed and then filtered for redundancy though.@sossost Could you please fix that? (Please pull before changing any code)
- Doesn't make seperate arrays for wide and wide_context or abbr and abbr_context in locales where these are different e.g.
frorru- see documentation of "context" parameter at fakerjs.dev/api/date#monthI'm not sure whether I get you correctly here. The context variants arent touched or generated because AFAIK there is no Intl way to do that.
Sorry I didn't understand exactly what to do. Can you give me more hints??
Sorry I didn't understand exactly what to do. Can you give me more hints??
- Replace all
new Intl.DateTimeFormat({ locale, ...});
With
new Intl.DateTimeFormat({ locale: locale.replaceAll('_', '-'), ...});
- Find all instances where locale.date.weekday/month is the same as the parent locale. E.g. en is the parent of en_US and thus the locale files would be identical so the later should be skipped/not created. Some locales such as uz_UZ_latin dont have parents so they need the additional files.
With a big thanks to @sosost for their work so far, but I'm starting to feel trying to do this as a script that generates everything at once is maybe a mistake.
Instead perhaps we can modify it so it's run manually and you give it a single locale as a parameter, and it attempts to generate the days and months for that locale only.
That way we don't need to code in lots of edge cases, and it's easy to run it for missing locales, and review the diff for one locale at a time.
Once generated, it's not like the days of the week or months of the year will ever change ...
Sorry, for the delay. I will tackle this once I have some free time.
Ready for review.
The locales can be more easily reviewed using:
rm -rdf src/locales
git reset next -- src/locales
git restore src/locales
pnpm run preflight
git add src/locales
Added to the meeting notes for discussion.
I tested with Node 20.0.0 and 22.2.0. There are differences in data for en-AU, eo, es-MX, mk, uk and vi. Mostly minor things like tweaks to the shortened versions, capitalization etc. But enough to cause havoc with people running preflight with different node versions.
Team Notice
We talked about it in the team meeting, but we haven't been able to decide anything yet.
Team Decision
- We will not merge this, as it
- the data changes frequently
- it breaks our CI tests
- any variant of "only generating the data" produces unpredictable results depending on the used node version
- The data are easier to manually insert than describing how to run the custom script
Thanks for your contribution anyway ❤️