update validate-tooling-data for eliminate case insensitive languages
What kind of change does this PR introduce? Feature - Adds case-insensitive unique validation for language entries
Issue Number:
- Closes #1443
- Related to #___
- Others?
Screenshots/videos:
Forcefully made mistakes in the name of language,
Validator finds the mistake,
If relevant, did you update the documentation?
Summary This PR introduces case-insensitive unique validation for language entries in the tooling data to solve several existing problems:
- Inconsistent language casing across tools (e.g., "JavaScript" vs "javascript" vs "JAVASCRIPT")
- Potential confusion for users seeing the same language listed multiple times
My solution:
Implements a custom AJV keyword caseInsensitiveUnique that:
- Detects and reports case-insensitive duplicates using set
- Provides clear error messages for easy fixes
ajv.addKeyword({
keyword: 'caseInsensitiveUnique',
type: 'array',
validate: function (schema, data) {
if (!Array.isArray(data)) return false;
const languagesSet = new Set();
const languagesLowercaseSet = new Set();
data.forEach((tool) => {
if (tool.languages) {
tool.languages.forEach((language) => {
languagesSet.add(language);
languagesLowercaseSet.add(language.toLowerCase());
});
}
});
if (languagesSet.size !== languagesLowercaseSet.size) {
console.error('Duplicate languages found');
const lowercaseMap = new Map();
languagesSet.forEach((language) => {
lowercaseMap.set(
language.toLowerCase(),
(lowercaseMap.get(language.toLowerCase()) || 0) + 1
);
});
lowercaseMap.forEach((value, key) => {
if (value > 1) {
console.log('Duplicate found for:', key);
}
});
validate.errors = [{
keyword: 'caseInsensitiveUnique',
message: 'array contains case-insensitive duplicates',
params: { keyword: 'caseInsensitiveUnique' }
}];
return false;
}
return true;
}
});
Does this PR introduce a breaking change?
✅ Yes
Impact:
This PR enforces case-insensitive uniqueness for language entries. Any existing tooling data that includes language names with inconsistent casing—such as "JavaScript" and "javascript"—will now fail validation. This change helps eliminate redundancy and confusion caused by duplicate entries with different letter cases.
Who is affected:
Tool maintainers and contributors who have added language entries with varying casing.
Migration Path:
Update your languages arrays to ensure that each language appears only once in a consistent format, preferably matching the casing defined in the schema enum. For example:
# ❌ Before
languages:
- "JavaScript"
- "javascript"
- "Go"
- "go"
# ✅ After
languages:
- "JavaScript"
- "Go"
built with Refined Cloudflare Pages Action
⚡ Cloudflare Pages Deployment
| Name | Status | Preview | Last Commit |
|---|---|---|---|
| website | ✅ Ready (View Log) | Visit Preview | ac51230d289640391c705f9aa13a3e90a70bb551 |
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 100.00%. Comparing base (
219521e) to head (ac51230).
Additional details and impacted files
@@ Coverage Diff @@
## main #1516 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 10 10
Lines 396 396
Branches 106 106
=========================================
Hits 396 396
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
🚀 New features to boost your workflow:
- ❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Hey @Vishv04, Can you tell me what change you made for this file 'validate-tooling-data.yml'. Is it necessary to change something there because the reason for failing this PR is changing this unauthorized file
Hey @Vishv04, Can you tell me what change you made for this file 'validate-tooling-data.yml'. Is it necessary to change something there because the reason for failing this PR is changing this unauthorized file
Hi @jagpreetrahi, I added a custom caseInsensitiveUnique rule in validate-tooling-data.yml to detect case-insensitive duplicates in the languages array—like treating "JavaScript" and "javascript" as the same. The logic checks if the input is an array, then builds two sets: one with original values and one with lowercase versions. If their sizes differ, it logs duplicates and throws a validation error.
@jviotti can i get your support to help me reviewing this solution?
I don't think there is any reason to add a new keyword here. Why not just use pattern to set a regular expression that only allows lowercase strings?
Thank you @benjagm @jviotti for your response on this PR.
I don't think there is any reason to add a new keyword here. Why not just use
patternto set a regular expression that only allows lowercase strings?
Yes, using a pattern works, but it feels a bit like we're controlling user input, since users will naturally write "Javascript" instead of "javascript" (just my assumption, it might be wrong). I created the custom keyword to avoid forcing users to change how they enter data. Still, using a pattern could also be a workable option.
I really suggest just using pattern to avoid the extra complexity of a new keyword (mainly with AJV). The convention can be to just force everybody to write languages in lowercase.
Okay @jviotti, I will use pattern and fix this issue. Thank you for your help.