JSON-Schema-Test-Suite icon indicating copy to clipboard operation
JSON-Schema-Test-Suite copied to clipboard

Add an `id` property to all test cases

Open Julian opened this issue 2 years ago • 22 comments

Problem: Tests in the suite are occasionally moved around in the file, or their descriptions are clarified, or they are modified because they have a bug. Given that the intended use of the test suite is that downstream users -- (all implementations, Bowtie, other research use cases) -- be able to run the tests, they do not have a way to "track" a test over time. Specifically, say an implementation has a known issue with a specific test for $dynamicRef -- a test case called $dynamicRef flubs the bub when you bla bla. An implementer who wishes to mark this test as a TODO in their test suite has no reliable way to do so over the long term, because it's name, position within the file, or even file itself may change here upstream for various reasons.

We should therefore introduce a stable ID which we guarantee will never change for a test, even if its description changes, or if the test is moved elsewhere within the file or suite.

A "slug" is likely the simplest choice structure-wise, in that it's derivable from the existing test descriptions. UUIDs or globally incrementing integers are other options (see below for considerations).

This addition should also include a sanity check ensuring that the IDs have the properties we wish of them (i.e. uniqueness, see below).

For context, we had previously added this exact concept to the test case schema in a PR #53 but did not ever add it to tests

Considerations:

  • The IDs should likely be version-wide unique -- i.e. not simply within one file, though also not necessarily across the whole test suite. This is to ensure we can move tests between files if we ever recategorize them (c.f. e.g. #630 which we ultimately haven't done, but we have in other cases) This would mean a true "global" ID for a test would be the composite ID (version, ID).
  • We should spend a moment considering whether IDs should be historically unique as well -- i.e. if we delete a test for whatever reason, not reuse the ID we gave it.
  • If a test was incorrect, we should likely change it's ID, given that anyone referencing it has possibly skipped it because they indeed had the correct behavior -- but this may need a bit more thought.
  • We may need to be more precise in the future in cases where we modify a test in deciding whether the new test is the "same" and therefore should keep its ID or "new" in which case it should get a new one. In general we should err heavily towards never modifying tests in any non-trivial way, so this hopefully will never come up "in practice" but it undoubtedly has once or twice previously.
  • We need to consider what to do about tests themselves (rather than test cases) -- one would suspect we could rely on the ordering of the test array as IDs for tests, but again, this means we are not able to reliably reorder tests within a test case without disrupting downstream or forcing reliance on their test names. It seems initially then like we should include IDs for tests as well.
  • The test case schema will need updating to document and allow this new property. We should initially make the property optional until all tests in the suite have it, at which point we can then require the property.
  • We should probably write the above considerations down in the README when merging a PR implementing this.
  • Please do not send a giant PR which does this for all files all at once. Send small PRs, perhaps one file at a time, or even less, to help reviewers be able to review the PR reasonably.

Julian avatar Oct 31 '23 15:10 Julian

If you'd like to assign me to this, I'll take it on.

I propose the following in response to this:

  1. Use UUIDs for the IDs, and make them globally unique, rather than just unique within the version of the spec.
  2. Using Array index for the tests themselves is extremely brittle - people keep arbitrarily adding tests in the middle of the lists, and it is difficult to enforce. I propose using a UUID for each test within the test suite.
  3. I will write a tool to update the IDs for everything and land a complete update all in one go
  4. If a test changes in any way, I propose that its ID should be changed, and we should document that in the README
  5. Note that with respect to (4) a test that relies on the metaschema should not be considered to be changed if the metaschema changes.

mwadams avatar Nov 15 '23 10:11 mwadams

I originally presumed we would have some sort of human readable ID. UUIDs would be simpler in some respects. I guess it depends on the intended use case.

If someone has a list of tests they skip, having human readable names which partially suggest the sort of test in question, it could be helpful. However, this wouldn't be the best if we agree that IDs should change if the test changes (which I think is a good idea).

In that case, I wonder if a simpler approach is to just MD5 the test object... although I suspect because of JSON and object ordering, that might not end up the same depending on various things.

All that to say, are we convinced UUIDs as IDs would be the best here?

Relequestual avatar Nov 15 '23 11:11 Relequestual

I did consider an MD5 hash of the test object (and the schema in the case of the test case) but that makes it rather harder to generate while developing the tests, and feels like a barrier to entry.

mwadams avatar Nov 15 '23 14:11 mwadams

I don't really have any strong preference on the structure of the IDs -- I think I mentioned I think slugs are better because of the extra benefit (of being user readable and usable for other reasons in isolation) but if I'm not doing the work I definitely don't care.

The only thing I do care about is:

I will write a tool to update the IDs for everything and land a complete update all in one go

I don't personally want to (ever) review large automated PRs (and have said elsewhere that as far as I have seen in "real life" it's simply impossible for any human to do.) -- meaning I'm happy to review the script you write to do this, or else for someone else who thinks it's possible to do the review -- though experience in this repo itself has reaffirmed as far as I'm concerned that this is the case :) -- the last two times I said this (no large automated PRs) and others reviewed it, the PRs indeed introduced subtle test bugs that were fixed over a few months afterwards (not that it took months to fix, it took months to notice, which is often the problem).

So yeah, happy to review the script, or else to say "find any other test suite contributor willing to review the PR".

Otherwise I'm fine with UUIDs if you prefer.

Julian avatar Nov 15 '23 16:11 Julian

Hello, I'd like to take a shot at this issue. I've read the discussion and understand the task. Can I work on it?

PrateekSingh070 avatar Sep 03 '25 15:09 PrateekSingh070

Hi @Julian,

I'd like to work on adding UUIDs to test cases and individual tests. I've read through the discussion and understand the approach.

My Plan:

  • Start with enum.json from draft2020-12/ (14 test cases, ~40 tests)
  • Add UUID id property to both test cases and individual tests
  • Submit a small, focused PR for easy review

I noticed @mwadams was previously assigned. If they're still working on this, I'm happy to pick a different file. Otherwise, could I proceed with enum.json?

Thanks!

AnirudhJindal avatar Oct 27 '25 06:10 AnirudhJindal

If you want to take on this issue, make a proposal here about how you want to approach the problem. @AnirudhJindal, you're on the right track in that regard, but you need to wait for discussion and agreement that your proposal is the way the community wants to move forward. Open source is slow. Our convention is to give the community at least two weeks to express their opinions.

From what I see so far, the open questions are:

  1. What kind of identifier will be used: slug, UUID, MD5, something else?
  2. What gets an ID? Is it the test-case, the test, or both?
  3. Should IDs be globally unique or unique per version? Most tests have a copy in several different versions. Should they have the same ID?
  4. When a change to a test happens, in what cases should an ID stay the same? When should it change?
  5. How will you document the process?
  6. How will you enforce the process?
  7. How will you roll out the changes?

jdesrosiers avatar Oct 31 '25 00:10 jdesrosiers

Hi @jdesrosiers,

Thank you for the feedback! I understand I moved too quickly and should have waited for community consensus first.

I'd like to create a proper proposal addressing the questions you've outlined. Could you help me understand the preferred format for such proposals in this project? Specifically:

  1. Should I post the proposal as a comment here in this issue, or create a separate discussion/document?
  2. Are there any examples of past proposals in this repo I could reference for format and level of detail?
  3. For each of the 7 questions you listed, should I provide:
    • A recommended approach with justification?
    • Multiple options with pros/cons for community discussion?
    • Both?

I want to make sure I present this in a way that makes it easy for the community to review and provide input over the next two weeks.

asking as because im a little new here

thanks for your guidance on this matter!

AnirudhJindal avatar Oct 31 '25 10:10 AnirudhJindal

Could you help me understand the preferred format for such proposals in this project?

There is no preferred format. It's not formal. Just communicate your ideas and your plan clearly.

  1. Should I post the proposal as a comment here in this issue, or create a separate discussion/document?

Yes. Post and discuss here.

  1. Are there any examples of past proposals in this repo I could reference for format and level of detail?

You can look at some other issues, but you won't find any consistency in format or level of detail. Don't worry about these things too much. Just start with anything and we'll go from there. Think of it as a discussion, not a final report. I'm sure we'll go back and forth working some things out. That's normal and expected.

  1. For each of the 7 questions you listed, should I provide:
    • A recommended approach with justification?
    • Multiple options with pros/cons for community discussion?
    • Both?

Any of this is fine, but I suggest leaning more toward recommending an approach with justification. I think there's already been a good amount of discussion and we're at the stage where someone needs to take into consideration everyone's input and propose a solution.

jdesrosiers avatar Oct 31 '25 19:10 jdesrosiers

Hi @jdesrosiers,

Thanks for the feedback! Since IDs need to be somewhat readable yet unique, I propose using a semi-readable structure combined with short UUIDs. Here's the idea:


Option 1: Semi-readable + UUIDs (preferred)

  • Test Case ID: <version-timeline>-<filename>-<testcase-uuid>
  • Test ID: <version-timeline>-<filename>-<testcase-uuid>-<short-uuid>

Example:

  • Test case in enum.json for the December 2020 draft: d20-12-enum-a1b2c3d4e5
  • A test within that test case: d20-12-enum-a1b2c3d4e5-f6g7h8i9j0

Pros:

  • Always unique.
  • Won’t change if descriptions or schemas are updated.
  • Easy to document: we can track just what changed for a given stable ID.

Option 2: Hash-based IDs

  • Test Case ID: <version-timeline>-<filename>-<testcase-hash>
  • Test ID: <version-timeline>-<filename>-<testcase-hash>-<test-hash>

Example:

  • Test case hash: d20-12-enum-9f8e7d6c
  • Test within that case: d20-12-enum-9f8e7d6c-a1b2c3d4

Pros:

  • Reflects changes in description or schema automatically.
  • Shorter IDs.

Cons:

  • Small chance of collisions.
  • IDs change whenever a test changes, making historical tracking slightly more complex.

I can also provide an automation script to generate either type of ID if the approach is approved.

I lean strongly toward Option 1 (UUIDs) as it’s more robust, easier to document, and guarantees uniqueness even across changes.

AnirudhJindal avatar Oct 31 '25 23:10 AnirudhJindal

That's a good start to the conversation.

When Ben expressed a preference for readable identifiers, I'm pretty sure he was referring to the test description. For example, my implementation skips a test in ref.json with the description, "$id with file URI still resolves pointers - *nix". (I don't allow file: to be used in $id for security reasons.) In my test harness, I have code to skip any test case with that description. That's brittle because if the description changes, my test harness breaks, but it also has the advantage that I can easily remember what test I'm skipping by reading the description. If there's an unreadable identifier in my code instead, then it's harder to keep track of what tests I'm skipping. Your semi-readable identifiers don't address the that concern.

I think there are a couple of more fundamental concerns that need to be decided before we can discuss the identifier further. The first question is whether there needs to be an identifier for the test case, the test, or both. You seem to have chosen both, which I think is probably right, but I'd like to justify that decision. What are the use cases for each? Why is it necessary for both to have an identifier?

The other fundamental question is to what scope must the identifiers be unique. You seem to have chosen a globally unique identifier. I don't think that's the best choice. Every time we introduce a new version of JSON Schema, we make a copy of the existing tests in a new folder and make changes in the new location. That means we have a bunch of copies of the same test. I think if it's the same test, it should have the same identifier. That would be convenient for my test harness because I could skip the same test in every version I support using one identifier.

So, let's get agreement on those two fundamental questions first.

  1. What needs an identifier? The test case, the test, or both? Why?
  2. What is the scope of uniqueness? Can two identical tests in different versions have the same identifier?

jdesrosiers avatar Nov 04 '25 19:11 jdesrosiers

hi @jdesrosiers ,

my stance on the fundamental questions are as follows:

Question 1: What Needs an Identifier?

I think both test cases AND individual tests need them. your example with the file URI test is exactly the use case. You don't want to skip ALL of ref.json, just that ONE specific test about file URIs. If we only had test case IDs, you'd be stuck skipping entire categories when you really just need surgical precision for one annoying edge case.On the flip side, sometimes you DO want to skip a whole test case (like if you don't support a feature at all). So both levels make sense to me.

Basically: Test case IDs = skip whole categories Individual test IDs = skip specific edge cases Both are useful depending on the situation.

Question 2: Should IDs be version-specific or not?

OK so I initially suggested version-specific IDs in my earlier comment, but after reading your point, I totally see why that's wrong now.

You're right - if the same test exists across multiple versions, it should have the same ID. I was thinking having version prefixes would make things "clearer" but actually it creates a huge problem. When you upgrade from draft2020-12 to draft-next, your skip lists would break even though the tests didn't actually change - they just moved to a different folder. You'd have to update all your skip lists, bug tracking, and documentation just because of a folder move. That's pointless busywork.

What makes way more sense (like you said) is having version-independent IDs. Then you can skip the same test across all versions with ONE identifier. The version is already obvious from which folder you're running tests from. My bad on that initial suggestion - your approach is way more practical.

What should the IDs actually look like?

I'm thinking hierarchical paths with readable slugs like enum/simple-validation. This solves the readability problem you mentioned - you can actually tell what test you're looking at just by reading the ID. The hierarchical nature with the filename prefix keeps them unique, so we shouldn't run into collision issues. And since there's no version prefix, the same test keeps the same ID whether it's in draft2020-12, draft-next, or any future version. If we do somehow get collisions (which should be rare), we can just append a number like enum/simple-validation-2 or add a short hash suffix like enum/simple-validation-a1b2 to differentiate them.

I think this can solve most of the problem we have with diff types of ids without any readability or uniqueness trade-offs .

AnirudhJindal avatar Nov 04 '25 20:11 AnirudhJindal

There's a bit of vocabulary confusion happening here. Let's look at an example.

    {
        "description": "simple enum validation",
        "schema": {
            "$schema": "https://json-schema.org/draft/2020-12/schema",
            "enum": [1, 2, 3]
        },
        "tests": [
            {
                "description": "one of the enum is valid",
                "data": 1,
                "valid": true
            },
            {
                "description": "something else is invalid",
                "data": 4,
                "valid": false
            }
        ]
    }

This is a test case that contains two tests. The question is, do we need to identify each test, just the test case, or both? The individual tests are really the things that need to be skipable. Giving an identifier to the test case as well is just a convenience to skip all of the tests in the test case. In most cases, you'll probably want to skip all the tests, but there are exceptions. For example, the test case for "format": "date-time" includes tests for leap seconds. If your implementation don't support leap seconds, you would want to be able to skip just those tests, not the whole test case.

That would suggest to me that just identifying the test case isn't good enough. We either need to do both test case and tests or just tests. I'd rather not do both, but would it be too annoying not to be able skip a whole test case?

What makes way more sense (like you said) is having version-independent IDs. Then you can skip the same test across all versions with ONE identifier. The version is already obvious from which folder you're running tests from.

Agreed. Let's go with that.

I'm thinking hierarchical paths with readable slugs like enum/simple-validation.

I don't think we want the file name to be part of the slug. Something that has happened occasionally is test cases moving to different files. The identifier should still make sense when the test case moves. I think having "enum" in that slug is correct, but because it's a test for the enum keyword, not because it's in the enum.json file. So, maybe enum-simple-validation instead.

I like this approach. What do you have in mind for giving identifiers to existing tests? Going forward, I expect the test case author to construct them by hand, but filling in identifiers for existing tests will have to be automated. It probably has to be something like slugify(filename + testCaseDescription + testDescription). They won't be the nicest for readability, but it's probably the best we can do other than doing each one manually.

I think the downside of readable slugs is that sometimes descriptions are wrong. I fixed an incorrect description just recently. If the slug contains an incorrect description of the test, we would have to change it, which breaks the original intent of giving the tests identifiers in the first place. The purpose of the identifier is to be something that doesn't change if the description changes. Does that mean slugs aren't a good idea, or do the pros out weigh the cons? I'm not sure. If we don't have readable identifiers, implementers can always use comments next to the opaque identifiers to easily know what they're skipping. So, maybe it's not too terribly important.

Next question: what about enforcement? We'll need a GitHub Action to check that identifiers are unique within a version. Then we need to decide on the rules for when an identifier should change or not. If possible, we should automate checking for that too. If that's not possible to automate, maybe we can introduce a PR template to help remind authors and reviewers to check for those things.

I think if the data changes, the identifier definitely needs to change. But, there are some schema changes that wouldn't require a change. Changing $schema usually doesn't change the semantics of schema, but could in some cases. Adding/removing/changing and annotation-only keywords shouldn't change the schema either. Or, should we play it safe and just say any change to the schema should require a new identifier? That rule we could automate and remove the possibility of human error, but could result in some unnecessary identifier churn.

Summary / Next steps

  • We decided on using identifiers that are unique within a version of JSON Schema.
  • We still need to decide whether test cases should have identifiers or just tests.
  • We need to explore the trade-offs of using slugs and decide if we're sure we want to use them.
  • We need to discuss further automation and rules for when identifiers change.

jdesrosiers avatar Nov 05 '25 20:11 jdesrosiers

Question 1: Test Cases vs Tests - Both Need IDs

I think we absolutely need both levels of IDs, and the numbers from the actual test suite prove why this is necessary. When you have files with 44-50 individual tests across 22 test cases, having only individual test IDs would be a massive pain for implementers. Your leap seconds example is perfect for showing why both levels matter. Sometimes an implementer needs to skip just those 2 specific leap second tests because their datetime library doesn't handle them. But other times, an implementer might not support the entire enum keyword at all, and forcing them to list all 44 individual test IDs in their skip configuration is unreasonable. The two-level system gives implementers the choice: use test case IDs for broad exclusions, use individual test IDs for surgical precision. Most of the time they'll probably skip whole test cases, but having the granularity for edge cases is essential. So yes, I strongly vote for doing both test case and individual test IDs. Question 2: Naming Convention I'm fully aligned with your suggestion to use keyword-based slugs like enum-simple-validation rather than including the filename. Question 3: Why Slugs? I think slugs are the right choice despite the description-change risk. Here's my reasoning: The big part of this feature is so implementers can reliably skip specific tests and track them over time. When someone looks at their skip list and sees enum-heterogeneous-validation/objects-are-deep-compared, they immediately understand what they're skipping. Regarding the concern about test descriptions being wrong - I think that's exactly why giving tests their own IDs is important. The probability of getting the description wrong for both the test case AND the individual test is very low, making this a highly advantageous trade-off. And even if we do occasionally need to change an ID because a description was incorrect, that's an acceptable cost. It would be a documented breaking change that implementers need to know about anyway. The flexibility and usability that readable slugs provide outweigh the maintenance burden of occasionally updating them when descriptions are genuinely wrong. Next Steps I can write a script for updating existing tests with IDs. However, I'm still a little confused about what changes should and shouldn't affect the ID, and I'm still figuring out the GitHub Actions implementation because I'm still new to this section of GitHub. Would appreciate guidance on:

  1. Clear rules for when IDs must change vs when they can stay the same
  2. GitHub Actions setup for validation

AnirudhJindal avatar Nov 08 '25 22:11 AnirudhJindal

having only individual test IDs would be a massive pain for implementers.

an implementer might not support the entire enum keyword at all, and forcing them to list all 44 individual test IDs in their skip configuration is unreasonable.

I disagree. It is far from normal and highly discouraged for an implementation to skip entire features. One or two tests cases (like the file: URI scheme example) or tests (like the leap seconds example) are all an implementation should really be skipping. It should be limited to a couple of edge cases only. If they're skipping whole features, they either aren't ready to release or are releasing something that isn't compliant with the spec. Either way, these aren't the kinds of situations we should be optimizing for.

Since the amount of skipped tests should be very small, I think it would be best to, at first, only implement identifiers for tests. Then we can wait for feedback and if there's demand for test case identifiers as well, we can add those in a different phase.

Regarding the concern about test descriptions being wrong - I think that's exactly why giving tests their own IDs is important. The probability of getting the description wrong for both the test case AND the individual test is very low, making this a highly advantageous trade-off.

This doesn't make sense to me. I expect the test slug will be a concatenation of the test case description and the test description. So, if the test case description needs to change, that means the test identifiers for all of its tests need to change. So, it's a problem if just the test case description changes, not both the test case AND the test.

I'm also concerned that using slugs based on the description will discourage us from improving the descriptions because we don't want to change identifiers. Theoretically, we don't need to change the slug when we change the description if it's just rewording, but I'm still concerned people will avoid making improvements to avoid breaking the symmetry between the two. The only way I see slugs being ok is if they aren't generated from the descriptions. That would be an unreasonable task for someone to go through every test and create slugs for each, but maybe it's a good use case for an LLM? We could ask it to make a terse slug for each test and see if that comes up with something reasonable. Otherwise, I'm leaning toward UUIDs or MD5s.

The big part of this feature is so implementers can reliably skip specific tests and track them over time. When someone looks at their skip list and sees enum-heterogeneous-validation/objects-are-deep-compared, they immediately understand what they're skipping.

I'm not convinced that that makes slugs necessary. People can always have comments next the identifier in their skip list giving them the same quick understanding of what they're skipping. For example, I think this accomplishes the same goal, it's just a little less convenient.

const skip = new Set([
  "46c9da93-c22f-4882-a2bd-9aa88916f76e" // enum.js | heterogeneous validation | objects are deep compared
]);

So, I think slugs are just a nice-to-have and we should feel free to use something else if we don't think it's working out. I think the only way I'd be comfortable with slugs is if they're not simple copies of the descriptions. The LLM idea might be that solution, but it's a lot more complicated that just using an opaque identifier. It would also require a lot more review as we would have to check that each slug made sense because LLMs are notorious for doing weird things. Also, I think there could be benefits to using a hash (like MD5) of the schema and data. That way we can automate checking when the identifier needs to change or not. That is something I'd like to explore further.

I can write a script for updating existing tests with IDs.

It sounds like we're not quite ready for that yet. If you want to explore the LLM generated slugs idea, it would be a good next step to try it on say, enum.json and we can see how well that kind of thing might work.

I'm still a little confused about what changes should and shouldn't affect the ID

That's a big thing we need to work out. I think there are two main approaches we can take.

  1. Any change to a schema, or test data should result in new identifier.

    This has the benefit of allowing us to use a hash of the schema and test data to automate checking identifiers, which is really appealing. The downside, is that it could cause unnecessary identifier churn if insignificant changes are introduced. For example, if the test case is testing that something is a string and the test data changes from "foo" to "bar", that would result in a new identifier even if though it's fundamentally the same test.

  2. Only changes that affect the fundamental nature of the test should result in a new identifier.

    This would give us maximum control over deciding if a change makes the test truly different than it was before. The downside is that it requires humans to make that decision, which both increases the reviewer burden and introduces the possibility of human error.

I wonder if something in between would work? If we could filter out parts of the schema that don't affect validation like $comment and annotation-only keywords, maybe that's good enough to reduce most potential churn and allow for automated identifier generation. $schema is an interesting case, because sometimes it matters and sometimes it doesn't. We know we want the same tests in different versions to have the same identifier, which means we would have to exclude $schema from the hash, but sometimes the version matters. I'm thinking of $ref that changed behavior in 2019-09. Are there cases where we'd get the same hash if we drop $schema even though they have different results in different versions? I don't think so, but we'd have to look into that.

I'm still figuring out the GitHub Actions implementation because I'm still new to this section of GitHub.

Don't worry about it. Focus on writing the scripts to do the check we want do. Then writing an action to call that script is fairly trivial. We can help if needed, but I think you'll figure it out.

jdesrosiers avatar Nov 10 '25 21:11 jdesrosiers

I disagree. It is far from normal and highly discouraged for an implementation to skip entire features. One or two tests cases (like the file: URI scheme example) or tests (like the leap seconds example) are all an implementation should really be skipping. It should be limited to a couple of edge cases only. If they're skipping whole features, they either aren't ready to release or are releasing something that isn't compliant with the spec. Either way, these aren't the kinds of situations we should be optimizing for.

Since the amount of skipped tests should be very small, I think it would be best to, at first, only implement identifiers for tests. Then we can wait for feedback and if there's demand for test case identifiers as well, we can add those in a different phase.

i totally agree on this.

I also think we should define the identifier format clearly, and then decide what actually counts as changing the identity of a test. What I have in mind is:

<slug-or-uuid>__v<version>__<semantic-hash>

  • slug-or-uuid stays constant across drafts and over time. readable slug or UUID, both work.
  • v<version> only changes when the identity of the test changes.
  • semantic-hash comes from a canonical, filtered version of the test so only real behavior changes affect it.

just to make it clearer:

For example:

Schema:
{
"type": "number",
"minimum": 2
}

Test:
{ "data": 1, "valid": false }

If we reorder keys, rewrite descriptions, or update $schema, the semantic-hash changes but the behavior doesn’t, so v0 stays v0.

If we change "minimum": 2 to "minimum": 3, that’s a real behavioral change — semantic hash changes and we bump v0 → v1.

i think this gives us more information about the test we are skipping, its nature, and lets us see if the fundamental behavior ever changed.

i’m still unsure about the exact rules for what should affect the hash, so here’s what i think makes sense:

things that should change the semantic hash

  • any modification to validation keywords or their values (type, enum, minimum, etc.)
  • structural changes to validation logic (adding/removing branches, changing required props)
  • changes to $ref or any referenced subschemas
  • changes to test.data that affect the behavior
    (e.g. changing 1 → 2 in a minimum: 2 test)
  • changing valid (true ↔ false)
  • draft-specific semantic differences (like $ref recursion changes) — these should really be separate tests with different IDs anyway

things that should NOT change the semantic hash

  • annotation-only keywords (description, title, $comment, examples, default, readOnly, writeOnly, deprecated)
  • formatting, whitespace, or key order
  • reordering arrays where order doesn’t matter (required, enum, array-type, etc.)
  • $schema (unless the draft changes behavior, in which case it’s basically a new test)

and for the version bump part — i think it’s important to separate “hash changed” from “identity changed”:

things that should bump the version (v0 → v1)

  • the behavior the test asserts has changed
  • the meaning of the schema changed in a way that affects the result
  • the expected outcome (valid) intentionally changed
  • a draft migration that introduces new semantics (e.g., $ref behavior differs)

things that should NOT bump the version

  • cosmetic edits (descriptions, comments, titles)
  • formatting or key reorder
  • rewording test names
  • renaming schema keys that don’t affect validation
  • updating $schema when the behavior stays the same
  • changes to data that don’t affect meaning (e.g., "foo""bar" in a simple type: string test)

so basically:

  • semantic hash change = something changed
  • version bump = the identity changed

hash changes are just a signal, version bumps should follow only when the behavior truly shifts. more often they will be constant through versions but still may change . i think if we can identify the stuff that affect or does not affect the id then we can automate the process very efficiently .

AnirudhJindal avatar Nov 15 '25 10:11 AnirudhJindal

This needs to be way simpler. It's trying to solve every imaginable problem and makes it too complicated for anyone to remember how it works. So, we're either forcing everyone to read a long complicated explanation of how to construct or modify identifiers every time they make a change, or more likely people don't read and understand it and the identifiers end up becoming meaningless because they aren't used consistently.

We need a more practical solution. It needs to be one thing with a simple explanation of how it works, or something that can be fully automated where fully automated is preferred if we think it's good enough. There are three options.

  1. A UUID or slug that changes when we determine the change was meaningful.
  • ➕ The id only changes when it really needs to
  • ➖ Error prone
  1. A hash of the schema, data, and valid.
  • ➕ Can be automated
  • ➖ The id could change more than it needs to
  1. A hash of the normalized schema, data, and valid
  • ➕ Can be automated. Requires us to build something to normalize the schema, but it's a one-time cost.
  • ➖ The id could change more than it needs to. Even with normalization, some insignificant changes could still result in a new identifier.

Option 2 is the easiest solution and while not perfect, would probably be plenty good enough.

Option 1 has no upfront work, but creates more work for maintainers. Is that extra work worth it? Given that we've gone this long without identifiers and it's not something people are complaining a lot about, it's probably not worth it. It probably doesn't have to be perfect.

Option 3 creates a lot of work up front, but once it's setup, it should be fully automated. Like option 2, it's not perfect, but it's better and also still plenty good enough.

I think I've talked myself out of option 1. I don't think this feature is important enough to justify increasing the maintenance burden. I think option 2 and option 3 are both good enough solutions. So, since you're the one who would be implementing it, I leave it to you to decide if the benefit from option 3 is worth the extra effort.

jdesrosiers avatar Nov 17 '25 21:11 jdesrosiers

Hi @jdesrosiers, I’d like to move forward with Option 2 for now — using a simple MD5 hash of { schema, data, valid } to generate test IDs.

This approach is:

fully automated,

very easy to explain and review, and

gives us something real to test and get feedback on.

Once Option 2 is implemented, we’ll start seeing actual data about where IDs churn. From that real feedback, I’d like to open a separate conversation about what should or shouldn’t be included in a normalized schema (Option 3). The real-world churn from Option 2 will show us which factors genuinely matter and will make the normalization discussion much clearer.

Yes, there can be unnecessary churn with simple hashing — but instead of over-engineering normalization upfront, I think it’s better to ship this first and see how much it actually affects us in practice.

For now, here’s a simple script I wrote to add MD5 IDs to tests that don’t have one yet:

// add-test-ids.js
const fs = require("fs");
const crypto = require("crypto");

function md5(obj) {
  return crypto.createHash("md5").update(JSON.stringify(obj)).digest("hex");
}

function addIdsToFile(filePath) {
  console.log("Reading:", filePath);
  const tests = JSON.parse(fs.readFileSync(filePath, "utf8"));
  let changed = false;
  let added = 0;

  if (!Array.isArray(tests)) {
    console.log("Expected an array at top level, got:", typeof tests);
    return;
  }

  for (const testCase of tests) {
    if (!Array.isArray(testCase.tests)) continue;

    for (const test of testCase.tests) {
      if (!test.id) {
        test.id = md5({
          schema: testCase.schema ?? null,
          data: test.data,
          valid: test.valid,
        });
        changed = true;
        added++;
      }
    }
  }

  if (changed) {
    fs.writeFileSync(filePath, JSON.stringify(tests, null, 2) + "\n");
    console.log(`done – added ${added} ids`);
  } else {
    console.log("no changes – all tests already had ids");
  }
}

// demo: start with enum.json
addIdsToFile("tests/draft2020-12/enum.json");


And here’s a simple script to check that IDs are present and unique inside a version directory:

// check-test-ids.js
const fs = require("fs");
const path = require("path");

function* jsonFiles(dir) {
  for (const entry of fs.readdirSync(dir, { withFileTypes: true })) {
    const full = path.join(dir, entry.name);
    if (entry.isDirectory()) {
      yield* jsonFiles(full);
    } else if (entry.isFile() && entry.name.endsWith(".json")) {
      yield full;
    }
  }
}

function checkVersion(dir) {
  const missingIdFiles = new Set();
  const duplicateIdFiles = new Set();
  const idMap = new Map(); // id -> file path

  for (const file of jsonFiles(dir)) {
    const tests = JSON.parse(fs.readFileSync(file, "utf8"));

    for (const testCase of tests) {
      if (!Array.isArray(testCase.tests)) continue;

      for (const test of testCase.tests) {
        if (!test.id) {
          missingIdFiles.add(file);
          continue;
        }

        if (idMap.has(test.id)) {
          duplicateIdFiles.add(file);
          duplicateIdFiles.add(idMap.get(test.id));
        } else {
          idMap.set(test.id, file);
        }
      }
    }
  }

  console.log("Files missing IDs:");
  if (missingIdFiles.size === 0) {
    console.log("None");
  } else {
    for (const f of missingIdFiles) console.log("- " + f);
  }
  console.log("");

  console.log("Files with duplicate IDs:");
  if (duplicateIdFiles.size === 0) {
    console.log("None");
  } else {
    for (const f of duplicateIdFiles) console.log("- " + f);
  }

  console.log("\nCheck complete.");
}

checkVersion("tests/draft2020-12");


If the scripts look good, I’d like to move on to adding the GitHub Action next so the uniqueness check runs automatically in CI.

Let me know if this direction works — I can open a small PR starting only with enum.json.

AnirudhJindal avatar Nov 20 '25 18:11 AnirudhJindal

Good. This is thinking in the right way, but there's one thing I think you're missing. The point of using a hash is that it can be checked automatically against the actual content of the test. If all we wanted to do was check that the id was unique, we could have just used a random UUID. So, the id-checking script should generate the id from the schema/data/valid fields, compare that id to the one in the test, and report if they don't match. That way it will detect if a test changed and the id wasn't updated.

Given that, consider what would happen if we decide to add normalization after initially generating ids. That normalization could end up changing most of the ids without any of the tests changing. That would break all trust anyone has in those ids being stable. So, although you're thinking along the right lines of "do a small thing and collect data", I think normalization is a detail we need to be thoughtful about upfront if that's something we want to do. And, I think it's something we have to do at least to some extent if we want the same test to have the same id in different versions. For example,

{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "type": "string"
}
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "string"
}

We would want these to have the same id because they are the same test, but they would be different without normalization because the $schema value is different.

I have a fairly easy way to reasonably normalize schemas, but it has downside. @hyperjump/json-schema processes schemas into an AST. If we use that AST instead of the raw schema, it would completely solve the same-test-different-dialect problem. It would also allow us to ignore the contents of $comment. It would also detect changes to remotes that the test references. So, it's pretty much an ideal solution, except ...

The downside of that approach is that it makes use of a structure only used by @hyperjump/json-schema and that structure could change in the future. In fact, I have some minor changes planned for the future. Ideally, we build something here that's very similar to what is done to build the AST.

Or, we abandon the idea of ids being the same across dialects that are the same test and don't do any normalization.

jdesrosiers avatar Nov 21 '25 20:11 jdesrosiers

Hi @jdesrosiers,

So, the id-checking script should generate the id from the schema/data/valid fields, compare that id to the one in the test, and report if they don't match. That way it will detect if a test changed and the id wasn't updated.

totally agree will update this .

@hyperjump/json-schema processes schemas into an AST.

On the normalization side: I just looked at @hyperjump/json-schema and the AST idea really does look close to ideal in terms of semantics — same test across dialects, ignoring $comment, detecting changes in remotes, etc.

From my perspective, the trade-off now feels like:

1.building our own “test identity normalization” layer from scratch

  • probably a 2–3 week focused effort to cover edge cases well
  • fully independent of any library, but more design + maintenance cost

vs.

2.leaning on @hyperjump/json-schema’s AST for normalization

  • much less work to get something good and dialect-aware
  • but we accept that IDs are coupled to that library’s internal representation

I think for our specific use case, we could build a Hyperjump-like normalization layer tailored to just what we need — giving us flexibility and independence. But I'm questioning whether the ends justify the means here. Building our own normalization might introduce more complexity and edge cases than the problem actually warrants, especially when a battle-tested solution already exists that handles the tricky parts (cross-dialect semantics, $ref resolution, etc.).

My take: I think we should build our own normalization script. Yes, it increases maintenance load slightly, but not significantly enough to justify the coupling to Hyperjump's internals. More importantly, if we don't do this now, we'll likely need to do it eventually anyway — either when the AST structure changes or when we want more control over what counts as a "semantic change."

AnirudhJindal avatar Nov 22 '25 08:11 AnirudhJindal

More importantly, if we don't do this now, we'll likely need to do it eventually anyway — either when the AST structure changes or when we want more control over what counts as a "semantic change."

I completely agree.

probably a 2–3 week focused effort to cover edge cases well

I think I can save you a few weeks 😉. I was able to adapt the code used to build the AST to make a normalize function using a similar but simpler strategy. It uses @hyperjump/json-schema to handle all the hard parts like schema identification, following references, and dialect awareness, but it builds the AST itself instead of using the one @hyperjump/json-schema uses internally so it can remain stable even if the internal AST changes.

At this point, how about you create a PR and put it in "draft" mode. That will be a better way to share your scripts and get feedback as you progress.

jdesrosiers avatar Nov 22 '25 23:11 jdesrosiers