romcal icon indicating copy to clipboard operation
romcal copied to clipboard

refactor(util): improve data checks and update teresa_of_calcutta_virgin translations

Open tukusejssirs opened this issue 1 month ago • 10 comments

Summary

This PR contains two changes:

1. Change unused martyrology items check to informational

  • Update data-checks.ts to change the "useless martyrology items" check from an error to an informational message
  • Valid martyrology entries should not be restricted even if not yet used by any calendar - they may be reserved for future use
  • Renumber the check comments to reflect the new order

2. Update teresa_of_calcutta_virgin translations

Update translations and add source references for teresa_of_calcutta_virgin:

  • de: change Kalkutta to Kolkata
  • en: add source reference
  • es: add source reference
  • fr: add translation with source
  • it: add translation with source
  • pl: add translation with source

Sources:

  • https://press.vatican.va/content/salastampa/it/bollettino/pubblico/2025/02/11/0125/00250.html
  • https://liturgia.wiara.pl/doc/9143462.Swieta-Teresa-z-Kalkuty-w-Powszechnym-Kalendarzu-Rzymskim

tukusejssirs avatar Dec 06 '25 20:12 tukusejssirs

@emagnier, actually, do we need to output informational messages about extra martyrology items? I would just remove the test if you and @TobiTenno agree.

I would also remove the check of No useless localized name items in any locale: if we keep them in martyrology.ts, let’s keep them the locales too.

tukusejssirs avatar Dec 07 '25 17:12 tukusejssirs

I have found in rites/roman1969/src/calendars/countries/france/diocese-of-coutances.ts that all_holy_bishops_of_the_diocese_of_coutances is used used in martyrology prop of the all_holy_bishops_and_priests_of_the_diocese_of_coutances celebration. While I don’t why would we need to split such a celebration by the order of the priesthood (I’d split collective celebrations only if there are known individual persons that are part of that collective, e.g. in this case all know French priests and bishops which would be too long a list anyway; imagine that list for all_saints :infinity:), there are some other celebrations where this makes. The issue is that all_holy_bishops_of_the_diocese_of_coutances is not localised in en.ts, but it is in martyrology.ts as a separate item.

I suggest that we not only require all celebration IDs used at the top-level props in calendars to be defined in en.ts, but also all martyrology items.

Also I think that all celebration IDs in en.ts should be also defined in martyrology.ts (e.g. adalbert_of_prague_bishop_patron_of_poland used in poland/index.ts is not defined in ). Below is a list of celebrations defined in en.ts which are missing from martyrology.ts; some of them have patronage title added, others are composed of multiple persons or with companions, however, there are still some explicitly missing celebrations like canada_day or dedication_of_consecrated_churches or dedication_of_the_cathedral_basilica_of_saint_denis_france). We should somehow make sure everything is defined, like we can remove patronage titles using *patron* or companions using *_companions*. Unfortunately, there is one multi-personal celebrations (where two or more persons are named explicitly) that does not contain _and_ as it should (it makes it harder to remove them): alban_of_britain_julius_of_caerleon_aaron_of_caerleon_martyrs (I suggest to rename it to alban_of_britain_julius_of_caerleon_and_aaron_of_caerleon_martyrs).

Missing from martyrology.ts, but defined in en.ts
  1. adalbert_of_prague_bishop_patron_of_poland
  2. adeline_de_mortain_abbess_and_the_saints_of_savigny
  3. aidan_of_lindisfarne_bishop_and_the_saints_of_lindisfarne
  4. alban_of_britain_julius_of_caerleon_aaron_of_caerleon_martyrs
  5. all_holy_bishops_and_priests_of_the_diocese_of_coutances
  6. all_holy_bishops_of_the_archdiocese_of_lyon
  7. all_saints_of_the_archdiocese_of_lyon
  8. aloysius_versiglia_bishop_and_callistus_caravario_priest_martyrs
  9. andrew_apostle_patron_of_russia
  10. andrew_apostle_patron_of_scotland
  11. andrew_de_soveral_and_ambrose_francis_ferro_priests
  12. andrew_dung_lac_priest_and_companions_martyrs
  13. andrew_kim_tae_gon_priest_paul_chong_ha_sang_and_companions_martyrs
  14. andrew_zorard_of_nitra_and_benedict_of_skalka_hermits
  15. anthony_julian_nowowiejski_bishop_and_companions_martyrs
  16. arbogast_of_strasbourg_bishop_patron_of_the_archdiocese_of_strasbourg
  17. assumption_of_the_blessed_virgin_mary_patroness_of_france
  18. augustine_zhao_rong_priest_and_companions_martyrs
  19. basil_the_great_and_gregory_nazianzen_bishops
  20. benedict_of_nursia_abbot_patron_of_europe
  21. boris_of_kiev_and_gleb_of_kiev_martyrs
  22. bridget_of_sweden_religious_copatroness_of_europe
  23. brigid_of_kildare_virgin_copatroness_of_ireland
  24. canada_day
  25. canute_iv_of_denmark_eric_ix_of_sweden_and_olaf_ii_of_norway_martyrs
  26. catherine_of_siena_virgin_copatroness_of_europe
  27. catherine_of_siena_virgin_copatroness_of_italy_and_europe
  28. ceslaus_of_poland_and_hyacinth_of_poland_priests
  29. chad_of_mercia_and_cedd_of_lastingham_bishops
  30. charles_lwanga_and_companions_martyrs
  31. charles_spinola_and_jerome_de_angelis_priests
  32. christopher_magallanes_priest_and_companions_martyrs
  33. clement_of_ohrid_and_gorazd_of_moravia_bishops_and_companions
  34. columba_of_iona_abbot_copatron_of_ireland
  35. conrad_of_constance_and_gebhard_of_constance_bishops
  36. cornelius_i_pope_and_cyprian_of_carthage_bishop_martyrs
  37. cosmas_of_cilicia_and_damian_of_cilicia_martyrs
  38. cyril_constantine_the_philosopher_monk_and_methodius_michael_of_thessaloniki_bishop
  39. cyril_constantine_the_philosopher_monk_and_methodius_michael_of_thessaloniki_bishop_copatrons_of_europe
  40. cyril_constantine_the_philosopher_monk_and_methodius_michael_of_thessaloniki_bishop_slavic_missionaries_copatrons_of_europe
  41. david_of_wales_bishop_patron_of_wales
  42. dedication_of_consecrated_churches
  43. dedication_of_the_cathedral_basilica_of_saint_denis_france
  44. dedication_of_the_cathedral_of_notre_dame_de_strasbourg_france
  45. dedication_of_the_cathedral_of_saint_john_the_baptist_lyon_france
  46. dedication_of_the_notre_dame_de_paris_cathedral_paris_france
  47. denis_of_paris_bishop_and_companions_martyrs
  48. denis_of_paris_bishop_patron_of_the_archdiocese_of_paris
  49. denis_of_paris_bishop_patron_of_the_city_and_of_the_diocese_of_saint_denis
  50. epipodius_of_lyon_and_alexander_of_lyon_martyrs
  51. eugenia_of_alsace_and_attala_of_alsace_virgins
  52. florian_of_lorch_and_companions_martyrs
  53. francis_de_sales_bishop_patron_of_the_clergy_of_the_archdiocese_of_lyon
  54. francis_diaz_del_rincon_priest_and_companions_martyrs
  55. francis_of_assisi_patron_of_italy
  56. fructuosus_of_braga_martin_of_braga_and_gerald_of_braga_bishops
  57. fructuosus_of_tarragona_bishop_and_augurius_of_tarragona_and_eulogius_of_tarragona_deacons_martyrs
  58. george_of_lydda_martyr_patron_of_england
  59. gorazd_of_moravia_and_companions
  60. gregory_grassi_francis_fogolla_and_anthony_fantosati_bishops_and_companions_martyrs
  61. henry_ii_emperor_and_cunigunde_of_luxembourg
  62. hilarius_of_toulouse_bishop_and_sylvius_of_toulouse_bishops
  63. ignatius_de_azevedo_priest_and_companions_martyrs
  64. immaculate_conception_of_the_blessed_virgin_mary_patroness_of_the_philippines
  65. immaculate_conception_of_the_blessed_virgin_mary_patroness_of_the_usa
  66. irenaeus_of_lyon_bishop_patron_of_the_archdiocese_of_lyon_and_companions_martyrs
  67. jacinta_marto_and_francisco_marto
  68. jacques_francois_lefranc_martyr_and_martyrs_of_the_revolution
  69. jacques_jules_bonnaud_priest_and_companions_martyrs
  70. james_apostle_patron_of_spain
  71. jean_chevillard_and_companions_martyrs
  72. jean_robert_queneau_and_companions_martyrs
  73. joachim_and_anne_parents_of_mary
  74. joachim_and_anne_patroness_of_the_province_of_quebec_parents_of_mary
  75. joan_of_arc_virgin_copatroness_of_france
  76. john_de_brebeuf_isaac_jogues_priests_and_companions_martyrs
  77. john_de_brebeuf_isaac_jogues_priests_and_companions_martyrs_copatrons_of_canada
  78. john_de_brebeuf_priest_and_companions_martyrs
  79. john_fisher_bishop_and_thomas_more_martyrs
  80. joseph_spouse_of_mary_patron_of_canada
  81. kilian_of_wurzburg_bishop_and_companions_martyrs
  82. labor_day
  83. laud_of_coutances_bishop_patron_of_the_diocese_of_coutances
  84. laurence_wang_bing_and_companions_martyrs
  85. lawrence_ruiz_and_companions_martyrs
  86. leo_ignatius_mangin_priest_and_companions_martyrs
  87. marcellinus_of_rome_and_peter_the_exorcist_martyrs
  88. margaret_clitherow_anne_line_and_margaret_ward_virgin_martyrs
  89. marko_krizin_melchior_grodziecki_and_stephen_pongracz_priests
  90. martha_of_bethany_mary_of_bethany_and_lazarus_of_bethany
  91. martin_wu_xuesheng_and_companions_martyrs
  92. mary_stella_of_the_blessed_sacrament_mardosewicz_and_companions_virgins
  93. maternus_of_cologne_valerius_of_trier_and_eucharius_of_trier_bishops
  94. maurice_of_agaunum_and_companions_martyrs
  95. maurice_of_agaunum_and_companions_martyrs_patron_of_the_diocese_of_angers
  96. maurilius_of_angers_bishop_second_patron_of_the_diocese_of_angers
  97. mederic_of_autun_and_droctoveus_of_autun_abbots
  98. michael_gabriel_and_raphael_archangels
  99. modestus_andlauer_and_andrew_bauer_martyrs
  100. nereus_of_terracina_and_achilleus_of_terracina_martyrs
  101. nykyta_budka_and_vasyl_velychkovsky_bishops
  102. odile_of_alsace_abbess_patroness_of_alsace
  103. our_lady_mother_of_divine_providence_patroness_of_puerto_rico
  104. our_lady_of_angels_patroness_of_costa_rica
  105. our_lady_of_aparecida_patroness_of_brazil
  106. our_lady_of_guadalupe_patroness_of_the_americas
  107. our_lady_of_guadalupe_patroness_of_the_philippines
  108. our_lady_of_hungary_patroness_of_hungary
  109. our_lady_of_la_salette
  110. our_lady_of_lujan_patroness_of_argentina
  111. our_lady_of_mount_carmel_mother_and_queen_of_chile
  112. our_lady_of_sorrows_patroness_of_slovakia
  113. paternus_of_avranches_bishop_and_scubilion_abbot
  114. patrick_of_ireland_bishop_patron_of_ireland
  115. paul_chen_changpin_and_companions_martyrs
  116. paul_miki_and_companions_martyrs
  117. perpetua_of_carthage_and_felicity_of_carthage_martyrs
  118. peter_and_paul_apostles
  119. peter_baptist_blasquez_paul_miki_and_companions_martyrs
  120. peter_chanel_priest_patron_of_oceania
  121. peter_de_zuniga_and_louis_flores_priests
  122. peter_kibe_priest_and_companions_martyrs
  123. peter_poveda_and_innocent_of_mary_immaculate_canoura_priests_and_companions_martyrs
  124. philip_and_james_apostles
  125. philip_evans_and_john_lloyd_priests
  126. philip_of_jesus_de_las_casas_paul_miki_and_companions_martyrs
  127. pontian_i_pope_and_hippolytus_of_rome_priest
  128. pothinus_of_lyon_bishop_blandina_of_lyon_virgin_and_companions_martyrs
  129. pothinus_of_lyon_bishop_patron_the_city_of_lyon_blandina_of_lyon_virgin_and_companions_martyrs
  130. raymond_costeran_and_companions_martyrs
  131. roch_gonzalez_alphonsus_rodriguez_and_john_del_castillo_priests
  132. rose_of_lima_virgin_copatroness_of_the_philippines
  133. rupert_of_salzburg_and_virgilius_of_salzburg_bishops
  134. sancha_of_portugal_and_mafalda_of_portugal_virgins
  135. simon_and_jude_apostles
  136. sixtus_ii_pope_and_companions_martyrs
  137. six_welsh_martyrs_and_companions
  138. stanislaus_of_szczepanow_bishop_patron_of_poland
  139. stephen_the_first_martyr_and_principal_patron_of_the_archdiocese_of_toulouse
  140. teresa_benedicta_of_the_cross_stein_virgin_copatroness_of_europe
  141. thanksgiving_day
  142. therese_of_the_child_jesus_and_the_holy_face_of_lisieux_virgin_copatroness_of_france
  143. thomas_hioji_rokuzayemon_nishi_priest_and_companions_martyrs
  144. thursday_of_the_lords_supper
  145. timothy_of_ephesus_and_titus_of_crete_bishops
  146. translation_of_the_relics_of_odile_of_alsace_abbess
  147. ursula_of_cologne_and_companions_virgins
  148. vincent_lewoniuk_and_companions_martyrs
  149. wenceslaus_i_of_bohemia_martyr_patron_of_the_czech_nation
  150. wladyslaw_bladzinski_priest_and_companions_martyrs

After removing all celebrations that match *(_and_|_companions|patron)*, only 14 celebrations remain (incl alban_of_britain_julius_of_caerleon_aaron_of_caerleon_martyrs that should be renamed):

Missing celebrations
  1. [RENAME] alban_of_britain_julius_of_caerleon_aaron_of_caerleon_martyrs
  2. all_holy_bishops_of_the_archdiocese_of_lyon
  3. all_saints_of_the_archdiocese_of_lyon
  4. canada_day
  5. dedication_of_consecrated_churches
  6. dedication_of_the_cathedral_basilica_of_saint_denis_france
  7. dedication_of_the_cathedral_of_notre_dame_de_strasbourg_france
  8. dedication_of_the_cathedral_of_saint_john_the_baptist_lyon_france
  9. dedication_of_the_notre_dame_de_paris_cathedral_paris_france
  10. labor_day
  11. our_lady_of_la_salette
  12. thanksgiving_day
  13. thursday_of_the_lords_supper
  14. translation_of_the_relics_of_odile_of_alsace_abbess

WDYT, @emagnier and @TobiTenno?

tukusejssirs avatar Dec 07 '25 21:12 tukusejssirs

Some thoughts on the points discussed here:

  1. The data:check task is mainly meant to run in CI to ensure that our data is clean and complete when a PR is submitted. So I agree that an informational message can be removed. Maybe in the future we’ll have a dedicated task to generate statistics, and such informational messages would make more sense there.

  2. I also agree that any martyrology entry referenced in a LiturgicalDay should indeed be translated.

  3. Regarding all_holy_bishops_and_priests_of_the_diocese_of_coutances, I created two separate martyrology entries to distinguish the titles of these two groups of people.

  4. I’m not opposed to using Jest tests for some checks, but I think data:check should remain a separate task for now. Tests should stay focused on code behavior, not on validating large datasets. A future ESLint plugin could centralize these data checks, provide CI validation, and offer immediate IDE feedback, which seems like a better long-term solution.

emagnier avatar Dec 07 '25 22:12 emagnier

  1. The data:check task is mainly meant to run in CI to ensure that our data is clean and complete when a PR is submitted. So I agree that an informational message can be removed. Maybe in the future we’ll have a dedicated task to generate statistics, and such informational messages would make more sense there.

I agree with the statistics generation, I have talked some stuff like that before (incl generation of calendar propers, localisation coverage). :+1:

  1. I also agree that any martyrology entry referenced in a LiturgicalDay should indeed be translated.

:+1: Perfect, I’ll a test for that in this PR.

  1. Regarding all_holy_bishops_and_priests_of_the_diocese_of_coutances, I created two separate martyrology entries to distinguish the titles of these two groups of people.

Do you have use case for that? What is the advantage of distinguishing between bishops and priests in the celebration? :thinking:

  1. I’m not opposed to using Jest tests for some checks, but I think data:check should remain a separate task for now.

Why should it remain as a separate task for now? :thinking: If we move the checks to unit tests, the checks/tests would be run in the CI anyway.

Tests should stay focused on code behavior, not on validating large datasets.

I disagree with this statement.

IMHO unit tests should test anything related to our code (i.e. romcal in this case). And the data must be validated, as otherwise we can so much bad data and bad data results in inferior quality of romcal which results in hardly noticable bugs.

A future ESLint plugin could centralize these data checks, provide CI validation, and offer immediate IDE feedback, which seems like a better long-term solution.

As I have stated in this comment, some checks from data:check cannot be ported to ESLint because ESLint can process one file at a time, but some checks must use various/many files. I don’t say that some checks cannot be done at all, but they can be done either in a limited or very complex or resource-intensive way.

Either way, previously we have approved that the data (sources, martyrology.ts/catalogue, calendars with celebrations, localisations) would be moved to a separate package, then we could move these changes out there and in the core package and calendar revision packages would test the code part only.

tukusejssirs avatar Dec 07 '25 22:12 tukusejssirs

The two separate martyrology entries for all_holy_bishops_and_priests_of_the_diocese_of_coutances allow flexibility for future features, localized displays, and better data organization/searchability, which could add long-term value.

I fully agree that data must be validated in CI, that’s not in question. My point is about separation of concerns: unit tests are meant to verify a single functionality or small scope of code, while heavy multi-file data validations are better handled by data:check. This keeps tests fast, readable, and maintainable, while still ensuring data integrity.

To clarify my previous point regarding ESLint: it was suggested as a possible way to complement data:check and provide IDE feedback for simpler checks. I agree that multi-file validations cannot be fully moved there, but it remains an interesting option for future improvements. data:check continues to be the main mechanism for complex validations.

emagnier avatar Dec 07 '25 23:12 emagnier

The two separate martyrology entries for all_holy_bishops_and_priests_of_the_diocese_of_coutances allow flexibility for future features, localized displays, and better data organization/searchability, which could add long-term value.

I still don’t see much value in having two separate martyrology entries. IMHO the localisation of this particular celebration will always be done at the all_holy_bishops_and_priests_of_the_diocese_of_coutances level, but okay, you want what you’t talking about and it is out of the scope of this PR. :man_shrugging:

My point is about separation of concerns: unit tests are meant to verify a single functionality or small scope of code, while heavy multi-file data validations are better handled by data:check. This keeps tests fast, readable, and maintainable, while still ensuring data integrity.

As for the fast tests: all tests (incl my new tests and coverage) take about ~22 seconds to run and there 3 failing suites and 23 failing tests (the more output, the longer it takes). Based on the CI logs (where it look ~20 seconds, it add a second for many tests. And when we move to Vitest (which is my desire), it could be even faster. YMMV

As for readability and maintainability: that all depends on the tests author and preferences.

To clarify my previous point regarding ESLint: it was suggested as a possible way to complement data:check and provide IDE feedback for simpler checks.

My idea is to use a single, consistent way to test our data and code.

As for IDE feedback, yes, I get that lint errors/warnings could be useful. How important that is, that is another question.

I agree that multi-file validations cannot be fully moved there, but it remains an interesting option for future improvements.

I consider that a crucial thing to test them now, therefore, I am creating the unit tests.

data:check continues to be the main mechanism for complex validations.

I frankly disagree with that.


Another possible issue: why don’t we localise all these periods, but only epiphany and holy_week in the locales?

  • ChristmasOctave
  • DaysBeforeEpiphany
  • DaysFromEpiphany
  • ChristmasToPresentationOfTheLord
  • PresentationOfTheLordToHolyThursday
  • HolyWeek
  • EasterOctave
  • EarlyOrdinaryTime
  • LateOrdinaryTime

tukusejssirs avatar Dec 07 '25 23:12 tukusejssirs

data:check continues to be the main mechanism for complex validations.

i agree that these should become tests, and we should just run tests to validate data before we publish, it keeps the step as a standard lifecycle hook instead of a completely custom one

For all the rest of that, let's do those other checks that we 'd want to move (or other modifications) in a separate PR, please. this one was fairly straightforward, just touching this part of locales. if we want more enhancements, let's do it in a separate PR.

TobiTenno avatar Dec 08 '25 00:12 TobiTenno

To let you guys review the WIP changes (if you want), I have just pushed a commit with all the tests I have created.

Pending tasks:

  • [ ] review the tests for false positives/negatives;
  • [ ] fix the found issues (I think of skipping the addition of Brazilian celebrations and localisations which I have already mostly fixed in #993);
  • [ ] remove data:checks NPM script along with the JS script file and its call in the CI.

OFC we need to have the consensus on some of the parts, that is the main reason I want to share with you the tests I have created before I consider them done.


Issue the tests have found:

  1. Martyrology Data (2 failures):
  • Missing translations in en.ts for some used martyrology keys;
  • Martyrology catalog keys not sorted alphabetically.
  1. Locale Meta Keys (7 failures):
  • cs.ts: Missing cycles and ordinals
  • es.ts: Missing cycles
  • la.ts: Missing cycles
  • pl.ts: Missing cycles and ordinals
  1. Locale Calendar Keys (3 failures):
  • pt-br.ts: Missing GRC keys and Brazil calendar keys
  • es.ts: Missing keys for Spanish-speaking country calendars
  1. Calendar Chronological Sorting (25 failures):
  • Celebrations not sorted chronologically in many calendars like Bolivia, BosniaHerzegovina, Brazil, CostaRica, England, Finland.

i agree that these should become tests, and we should just run tests to validate data before we publish, it keeps the step as a standard lifecycle hook instead of a completely custom one

I agree with this. :+1:

For all the rest of that, let's do those other checks that we 'd want to move (or other modifications) in a separate PR, please. this one was fairly straightforward, just touching this part of locales. if we want more enhancements, let's do it in a separate PR.

To be honest, I wanted to expand the scope of this PR, as a check in data:checks blocked me, thus I wanted to fix it by refactoring it to unit tests, but along the line, I have found a few issues (bugs in the data:checks script, missing stuff, etc), therefore, I tried to do it in a more robust way, while still focusing on the same checks for martyrology.ts, calendars and locales.

Could you make a quick (not thorough) review of the changes I have made in the latest commit please? Does it seem to much of the scope broadening? If yes, what would you move in a separate PR?

I personally would keep all changes from the latest commit and update the PR title and description instead. However, I am open for merging the changes in any number of PRs. :man_shrugging:

tukusejssirs avatar Dec 08 '25 00:12 tukusejssirs

@emagnier, thanks for the thoughtful feedback on my changes. I’ll fix you findings them ASAP (last week I’ve been busy both at work and at home). However, note that I shared the latest commit just to be able to discuss the refartoring of data:checks to unit tests; I know there are still some rough edges and I personally didn’t actually review my own code yet (that’s why I have converted the PR to a draft for now). :wink:

I want to address your point #4 in more detail since we seem to have different perspectives on where data checks should live. Let me share some research I did (using Claude Code / AI) on this topic.

TL;DR

I believe consolidating data checks into Jest is the right approach for romcal, and I think we already have established precedent for this in our own codebase.


1. We Already Use Jest for Data Validation

Before looking at external examples, let’s look at what we already do in romcal:

Snapshot Testing of Calendar Data

In calendar-builder.test.ts:521-541, we have:

describe('Testing calendar snapshots', () => {
  test('General Roman Calendar in gregorian year', async () => {
    const gregorianCalendar2020 = await new Romcal().generateCalendar(2020);
    const gregorianCalendar2021 = await new Romcal().generateCalendar(2021);
    const gregorianCalendar2022 = await new Romcal().generateCalendar(2022);

    expect(gregorianCalendar2020).toMatchSnapshot();
    expect(gregorianCalendar2021).toMatchSnapshot();
    expect(gregorianCalendar2022).toMatchSnapshot();
  });
  // ...
});

According to git blame, this code was written by you on 12 Nov 2022. :wink:

This is literally data validation using Jest. We’re validating that the generated calendar data matches expected snapshots across multiple years. If ‘[t]ests should stay focused on code behavior, not on validating large datasets,’ then this existing code contradicts that principle.

The New Data Tests Follow the Same Pattern

The new tests in martyrology-data.test.ts, locale-data.test.ts, and calendar-data.test.ts are doing exactly the same thing: validating that our data is correct, complete, and consistent. They’re a natural extension of what we already do.


2. External Examples: This Is Common Practice

I did some research, and using test frameworks for data validation is actually very common.

Major Data Quality Frameworks

Both frameworks explicitly embrace the concept of unit tests for data. This isn’t a fringe idea—it’s a recognised best practice in data engineering.

Jest-Specific Examples

  • jest-json-schema (American Express) - Provides toMatchSchema matcher for validating JSON data against schemas in Jest tests.

  • i18n.test.js Gist - Jest tests specifically for:

    • Checking missing translation keys
    • Finding unused translation keys
    • Validating translation completeness across languages

    This is almost exactly what our locale validation tests do!

  • babel-plugin-tester - Uses Jest to validate fixture files and transformations. The author explicitly prefers fixture-based testing to inline code because it’s easier to read with syntax highlighting.

i18next Community Practices

In i18next community discussions, developers recommend building automated testing frameworks that compare translation keys across languages to ensure completeness. Many teams implement this directly in Jest.


3. The ESLint Plugin Argument

You mentioned:

A future ESLint plugin could centralize these data checks, provide CI validation, and offer immediate IDE feedback

I see the appeal, but I have concerns:

ESLint Processes One File at a Time

Some of our data checks need to cross-reference multiple files:

  • Martyrology validation - needs to check that keys used in calendars exist in the martyrology catalogue
  • Locale validation - needs to compare calendar definitions against locale files
  • Calendar sorting - needs to traverse inheritance chains across multiple calendar files

ESLint rules receive one file at a time. To do cross-file validation, you’d need to either:

  1. Build complex caching mechanisms that persist data between file analyses
  2. Load external files during rule execution (resource-intensive)
  3. Use ESLint’s newer meta.docs.url pattern with custom validators (still complex)

In other words, it would require significant effort to achieve what Jest already does out of the box.

It’s Speculative Future Work

Building a custom ESLint plugin is a significant engineering effort. We’d need to:

  • Design the plugin architecture
  • Implement custom rules
  • Handle the cross-file challenges
  • Maintain two systems during transition

Meanwhile, Jest is already set up, running in CI, and familiar to contributors.

IDE Feedback Is Nice-to-Have

Yes, real-time feedback would be great. But our data doesn’t change frequently — it’s not like code where you’re constantly making changes. Running npm test after modifying data files is perfectly acceptable.


4. Devil’s Advocate: Arguments for Separation

To be fair, here are the legitimate arguments for keeping data checks separate:

Argument Counter-argument
Conceptual separation: unit tests = code, data checks = data We already mix these with snapshot tests. The boundary is fuzzy in practice.
Different failure modes: a test failure vs. data quality issue Jest’s output clearly shows which test failed. We can use descriptive test names.
Slower tests: data checks might be slower Our data checks run in milliseconds. Not a real concern.
Different audiences: devs vs. data maintainers Everyone runs npm test. Having one command is simpler.

The strongest argument is conceptual purity. But I think practical benefits outweigh theoretical elegance here.


5. Practical Benefits of Jest Integration

Benefit Explanation
Single test runner One npm test command runs everything
Unified CI integration No need to manage multiple NPM scripts in CI
Consistent reporting Same output format for all validations
Watch mode npm test -- --watch works for data files too
Filtering Can run specific data tests with -t flag
Coverage reports Data validation is tracked in coverage

6. Proposed Compromise

If the separation concern is really important to you, we could:

  1. Keep all data validation in Jest
  2. Place data tests in a clearly separated location: __tests__/data-validation/ or files named *.data.test.ts
  3. Add a separate npm script: npm run test:data that filters to only data tests

This gives us:

  • ✅ Single test runner and CI integration
  • ✅ Clear logical separation in the file structure
  • ✅ Ability to run data checks separately if needed
  • ✅ All the tooling benefits of Jest

Conclusion

I think we should consolidate data checks into Jest because:

  1. We already do it — snapshot tests validate calendar data
  2. It’s common practice — Deequ, Great Expectations, jest-json-schema, i18n validation gists all show this pattern
  3. ESLint is impractical — cross-file validation is complex, and it’s speculative future work
  4. Practical benefits — one command, one CI step, familiar tooling

What do you think? Happy to discuss further or implement a compromise approach!

tukusejssirs avatar Dec 16 '25 20:12 tukusejssirs

i generally agree that unit tests are a good clean way of doing what we're wanting. trying to make lint rules work (a large lift of its own) for this for live feedback in a definition that will most certainly change in the near future as we try to get more information into locale and definitions seems like folly to me.

TobiTenno avatar Dec 17 '25 03:12 TobiTenno