community icon indicating copy to clipboard operation
community copied to clipboard

GSoC 2025: Better JSON Schema Errors

Open jdesrosiers opened this issue 10 months ago • 47 comments

Create a JavaScript library to convert standard JSON Schema output into clear, human-friendly error messages. The library should follow the examples set by existing tools like Atlassian's better-ajv-errors and Apideck's @apideck/better-ajv-errors, but use the standard JSON Schema output format introduced in draft-2019-09 instead of ajv's proprietary format.

Expected Outcomes

  • A library that transforms standard JSON Schema validation outputs into concise, easy-to-understand error messages.
  • A mechanism for loading and managing additional language packs to support presenting error messages in multiple languages.
  • Customization options for users to override default error messages or add custom ones.
  • Tested for compatibility with multiple implementations using the standard output format, including @hyperjump/json-schema.
  • Published on npm with proper versioning, a clear README, and example use cases.

Skills Required

  • JavaScript: Strong understanding of JavaScript and best practices for building libraries.
  • JSON Schema Specification: Familiarity with the JSON Schema standard, particularly the error output format introduced in draft-2019-09.
  • Error Message Design: Ability to translate structured error output into concise and meaningful human-friendly messages.
  • Library Development: Experience creating, testing, and documenting JavaScript libraries.
  • Open Source Practices: Understanding of Git, GitHub, and how to maintain a project for open source contributions.
  • Testing: Familiarity with Test-Driven Development (TDD) or Behavior-Driven Development (BDD) methodologies.
  • Collaboration: Comfortable using pair programming tools like VSCode LiveShare and participating in pair programming sessions for real-time collaboration.

Mentor(s)

  • @jdesrosiers

Expected Difficulty

Medium

Expected Time Commitment

175 hours

QUALIFICATION TASK

https://github.com/hyperjump-io/json-schema-lite?tab=readme-ov-file#qualification-task

jdesrosiers avatar Jan 27 '25 20:01 jdesrosiers

I'm very interested in the outcome of this to improve my own validator. Let me know if I can help in any way

jviotti avatar Jan 27 '25 20:01 jviotti

Thanks @jdesrosiers for this idea. I am interested to work on this idea in the upcoming GSoC event, if this gets accepted. I have never built a JavaScript library before, so I am not aware about the best practices just yet but I have strong understanding of JavaScript. I have also got the Open Source Practices and Collaboration part covered. I will be needing to work on Testing, Library Development and familiarising myself more with JSONSchema part.

To gain more understanding of JSONSchema standard, I am going through the docs. I am excited to get a chance to work on this one. Would you like to any guidance/resources to prepare for this project idea?

heysujal avatar Jan 28 '25 18:01 heysujal

Thanks for your interest @heysujal. IMO, there's no better way to get familiar with JSON Schema than to write a bunch of schemas. I suggest picking some domain and write some schemas to model it. It just needs to be complex enough to explore past the basics.

jdesrosiers avatar Jan 28 '25 22:01 jdesrosiers

(This is obviously a really good idea -- I want/wanted at some point soon to have Bowtie collect and compare error messages from implementations, so definitely keen to see where this goes).

Julian avatar Feb 05 '25 23:02 Julian

I'd like to see the rules generalized somehow so that non-JS implementations can also be made.

Also worth mentioning the challenges described in my blog post around the ambiguity of determining a "right" error.

gregsdennis avatar Feb 06 '25 01:02 gregsdennis

I'd like to see the rules generalized somehow so that non-JS implementations can also be made.

That would be nice. However, I don't think the "rules" used by this project are necessarily the rules everyone would want to use. How you present an error doesn't have any one correct answer. For example, of the two libraries I linked, one is optimized for CLIs and the other is optimized for APIs. We can certainly make a test suite in JSON like our validation test suite, but that's probably not what's needed. In any case, the test suite would be a comprehensive set of examples that could be used as a reference for others making similar tools. They can use the test cases to make sure their implementation covers the same situations nicely even if they choose to handle them differently.

jdesrosiers avatar Feb 06 '25 22:02 jdesrosiers

I feel like this is under-specified, or that determining the specification itself is part of the task (which might be beyond the scope and expectations of a GSoC participant).

Are we talking about transforming json schema error objects into a flat list of strings? If so, that's a very easy transformation if the error messages already exist (and some implementations, e.g. mine, already have that capability). Or are we wanting a standardized set of errors that can come from each keyword? (To do that, I would start by inventorying some popular implementations to see what they do, and attempt to come up with something similar or choose the best option of each of these - where "best" is not defined.) Or, perhaps propose an extension to the json schema error specification where standardized error codes could be used, together with sprintf-style arguments, so that an implementation could use a locale library to produce error strings in any language?

karenetheridge avatar Feb 09 '25 20:02 karenetheridge

I feel like this is under-specified, or that determining the specification itself is part of the task (which might be beyond the scope and expectations of a GSoC participant).

I think you're making this into a bigger thing than I had in mind. There's no specification and I don't expect a specification to be a result of this project. It's just a library. Hopefully it will be an example for others to make similar kinds of things, but I don't think there's anything to be standardized.

That said, yes, there is quite a bit that's left open that I expect candidates to provide details for in their proposals. Such as,

  • What audience is being served? (The examples I linked show two: CLI and API users)
  • What special features might be incldued? (For example, the two libraries I linked provide suggestions for misspelled enums)
  • How are they going to handle the infamously difficult to message oneOf/anyOf case and other tricky situations?

Are we talking about transforming json schema error objects into a flat list of strings?

No, I'd expect more than that. See the examples I linked in the description. better-ajv-errors, which is aimed at CLI output, presents the JSON where the error occurred with messaging inline. @apideck/better-ajv-errors, which is aimed at APIs, includes additional data that might be useful to applications such as an array of the required properties that are missing in addition to messaging.

If so, that's a very easy transformation if the error messages already exist

This tool couldn't use the messages from the validator. It would have to use its own messaging. One of the stated goals of this tool is to be able to provide messaging in multiple languages. Obviously we can't translate messages from arbitrary validators, so we would need to make our own messages and provide translations for those messages.

I didn't talk about this in the description, but one of the benefits of this approach to error messaging is that it decouples the messaging from the implementation. It makes it easier to change the implementation your application uses while knowing the messaging and how your application uses the messaging won't change. This only works if the library provides its own messaging.

Also, a huge motivator for me is to free implementers from the burden of having to worry about messaging and providing the right kind of messaging for every possible audience or not serving every audience. Ideally, implementers can just provide the standard output that provides instance location and schema location. Ideally, there would be multiple libraries that present messaging appropriately for different audiences (like the two examples I linked: CLI and API) and users can choose which one fits their domain independently of what implementation they choose.

Or are we wanting a standardized set of errors that can come from each keyword?

Definitely not.

Or, perhaps propose an extension to the json schema error specification

It wouldn't surprise me if this project inspires some proposals to improve the output specification, but I expect this project to work against the existing spec and nothing more. The schema location and instance location of an error should be enough as long as you have access to the schema and the instance to extract the necessary data to construct the message.

I hope that helps clarify my vision for this project. Thanks for bringing this up.

jdesrosiers avatar Feb 10 '25 20:02 jdesrosiers

Hi @jdesrosiers,

I’m thrilled about the idea of building a JavaScript library to convert standard JSON Schema (draft-2019-09) validation outputs into human-friendly error messages for GSoC 2025! I have solid JavaScript experience (e.g., building reusable libraries with Node.js) and a growing understanding of JSON Schema from experimenting with @hyperjump/json-schema. I love the challenge of turning technical data into clear, user-friendly messages—something I’ve done in Modern Vibe Homes.

I’d like to propose a library that not only delivers concise error messages but also supports language packs and customization, inspired by tools like better-ajv-errors. I’ve started digging into the draft-2019-09 output format and plan to submit a detailed GSoC proposal soon. Would you be open to reviewing a draft or suggesting specific features you’d like to see? I’d also be happy to prototype a small example if that’s helpful.

Excited to collaborate with you and the Hyperjump community! Looking forward to your thoughts.

GANESHSHARMA1 avatar Feb 28 '25 12:02 GANESHSHARMA1

Hey @jdesrosiers and the team! 👋

I’m Idan Levi, a software engineering undergrad with a strong interest in JavaScript and open-source contributions. I’ve worked with both front-end and back-end technologies like Next.js and express.js, and I have experience using JSON Schema in various projects.

A bit about me:

  • I’m deeply interested in building user-friendly tools and libraries, and this project aligns perfectly with my goal
  • I’ve worked on full-stack development and API integration projects and enjoy taking on challenges
  • I’ve also gained experience with Test-Driven Development (TDD), which will be helpful for ensuring this library is properly tested and works across different implementations.

I would love to contribute to the JSON Schema error message library as part of GSoC.

A quick question, regarding the flexibility of error message formatting: Given that this library will provide its own set of messages and aim for multiple languages, how will the library handle complex error scenarios like oneOf and anyOf? Will there be a built-in mechanism for handling these cases, or will it be up to the users to customize the messaging for such edge cases?

Thanks in advance for the opportunity and looking forward to collaborating! 😊

idanidan29 avatar Mar 01 '25 13:03 idanidan29

Hi @jdesrosiers,

I'm interested in contributing to this project for GSoC 2025. I have experience with JavaScript and npm libraries and have been exploring JSON Schema validation errors.

Are there any qualification tasks or prerequisites to complete before applying? Also, which repo should I contribute to for this project?

Looking forward to your response.

Vishv04 avatar Mar 01 '25 16:03 Vishv04

Thanks everyone for showing your interest in this project.

Would you be open to reviewing a draft or suggesting specific features you’d like to see?

I will provide one and only one review of your proposal. Aside from that, all discussion must be in public spaces like here or in the Slack #gsoc channel.

I’d also be happy to prototype a small example if that’s helpful.

I won't be looking at any code or demos aside from the qualification task, but if prototyping helps you think through the issues and ask questions, I think that's a great idea.

A quick question, regarding the flexibility of error message formatting: Given that this library will provide its own set of messages and aim for multiple languages, how will the library handle complex error scenarios like oneOf and anyOf? Will there be a built-in mechanism for handling these cases, or will it be up to the users to customize the messaging for such edge cases?

You tell me 😃. This is the kind of thing I want to see from your proposal. The fact that you've already identified the biggest challenge with error messaging is great. Now, analyze that problem and tell me in your proposal how you think is best to handle those kinds of errors and why.

Are there any qualification tasks or prerequisites to complete before applying?

The qualification task will be announced sometime in the next week.

Also, which repo should I contribute to for this project?

There's no repo yet. This will be a new project built from the ground up. I'll setup a repo when the project start date is approaching.

jdesrosiers avatar Mar 03 '25 04:03 jdesrosiers

Hi @jdesrosiers,

I’m super excited about the opportunity to work on a JavaScript library that transforms standard JSON Schema (draft-2019-09) validation outputs into clear, human-friendly error messages for GSoC 2025! I’ve got a strong grasp of JavaScript (e.g., crafting modular libraries with Node.js) and have been diving into JSON Schema through tools like @hyperjump/json-schema. I love the idea of making technical outputs more accessible.

Inspired by libraries like better-ajv-errors, I’d like to propose a solution that delivers concise messages, supports language packs for multilingual use, and offers customization options. To get started, here’s how I’d approach it:

  1. Parse the Output: Study the draft-2019-09 error format and write a utility to extract key details (e.g., instancePath, schemaPath, message).
  2. Message Templates: Create a default set of human-friendly templates (e.g., “Value at /age must be a number, got string”) with fallback handling.
  3. Language Packs: Design a simple system to load JSON-based language files (e.g., en.json, fr.json) for easy i18n support.
  4. Customization: Add an API for users to override messages or define custom ones via a config object.
  5. Testing & Publishing: Test against @hyperjump/json-schema and other implementations, then package it for npm with a solid README.

I’m planning to draft a full GSoC proposal soon—would you be willing to share feedback on it? I’d also love to hear your thoughts on these steps or any specific priorities you’d like to emphasize. If it helps, I can whip up a quick prototype to showcase the concept.

Can’t wait to collaborate with you and the Hyperjump team—this feels like a perfect fit for my skills and passion! Thanks for considering me.

GANESHSHARMA1 avatar Mar 03 '25 07:03 GANESHSHARMA1

This sounds like a really cool and useful project! Making JSON Schema validation errors easier to understand will definitely help a lot of developers. Looking forward to seeing how this comes together! 🚀

peter-abhinav avatar Mar 03 '25 13:03 peter-abhinav

Thanks for clarifying all that! I can't wait for the qualification task.

idanidan29 avatar Mar 03 '25 17:03 idanidan29

@GANESHSHARMA1 -- I’d also love to hear your thoughts on these steps or any specific priorities you’d like to emphasize.

That's all fine, but but it's all very generic and doesn't really say anything of substance. It's mostly just a summary of the project description. When you write your proposal, I want to see you go a lot deeper. Identify the main challenges you'll face and how you intend to handle them.

jdesrosiers avatar Mar 03 '25 19:03 jdesrosiers

@jdesrosiers -- That's all fine, but but it's all very generic and doesn't really say anything of substance. It's mostly just a summary of the project description. When you write your proposal, I want to see you go a lot deeper. Identify the main challenges you'll face and how you intend to handle them.

Sure, I'm working on it. I'm deeply understanding the code and its functionality. Soon, I will submit my proposal, where I will mention all the challenges and my intentions to solve them.

GANESHSHARMA1 avatar Mar 04 '25 03:03 GANESHSHARMA1

Hey @jdesrosiers I am eager to contribute to the project of building a JavaScript library for transforming JSON Schema validation outputs into human-friendly error messages. With a solid foundation in JavaScript and a growing interest in JSON Schema, I aim to create a library that simplifies error understanding for developers. I plan to implement multi-language support, customization options, and thorough testing for compatibility. This project will help me deepen my skills in library development, error message design, and open-source collaboration. I am excited to learn, contribute, and deliver a tool that enhances the developer experience with JSON Schema.

I am eager to learn, collaborate, and deliver a tool that simplifies schema exploration for developers worldwide.

Kashika23 avatar Mar 05 '25 17:03 Kashika23

Qualification Task

There's probably no better way to prepare for a project like this than to implement the JSON Schema output format for yourself. So, that's what the qualification task is going to be.

I've provided a simple JSON Schema implementation that implements the Flag output format. Your task is to update it to support either the Basic or Detailed output formats.

The implementation and more details about what I expect can be found at https://github.com/hyperjump-io/json-schema-lite. Good luck!

jdesrosiers avatar Mar 06 '25 06:03 jdesrosiers

Thanks @jdesrosiers for sharing the qualification task! I'll start working on implementing the Basic or Detailed output format in json-schema-lite. I’ll also begin drafting my proposal based on my approach and findings.

Vishv04 avatar Mar 06 '25 06:03 Vishv04

@jdesrosiers may I suggest that the Verbose format be considered over Detailed? In my experience, trying to figure out which nodes/branches should be retained vs. pruned, especially in an automated way, proved difficult and ultimately unsuccessful. The Verbose output is quite straightforward.

gregsdennis avatar Mar 06 '25 06:03 gregsdennis

Hey @jdesrosiers what is the deadline for submitting the qualification task?

idanidan29 avatar Mar 06 '25 11:03 idanidan29

may I suggest that the Verbose format be considered over Detailed? In my experience, trying to figure out which nodes/branches should be retained vs. pruned, especially in an automated way, proved difficult and ultimately unsuccessful. The Verbose output is quite straightforward.

There are two reason I chose not to allow the Verbose output. The goal of this project is to be something that works with a variety of implementations. I've never heard of anyone other than you and I that have supported Verbose. The other reason is that I thought it was too trivial a task especially given the starting point I'm giving them. All that's left to do, and what I want them to do, is to go through each keyword and figure out what needs to be retained vs pruned.

I expect them to run into a few ambiguities and I expect we'll discuss them and decide on the expected behavior together. I did enough of the assignment myself to be confident that it's not too hard, but what I gave them is a different approach from how I implemented it in my validator so there might be hard parts I'm not aware of. I kinda hope there are. This project is highly experimental. It's likely that we'll find that some things we want to do just aren't possible. If we see some of that in the qualification task, I'll get to see how they deal with those kinds of road blocks.

jdesrosiers avatar Mar 06 '25 20:03 jdesrosiers

what is the deadline for submitting the qualification task?

I don't think we've set a deadline for qualification tasks as an organization. I'm willing to accept it as long as the application period is open (April 6), but you might want to get it in early enough to make use of the feedback in your application. I'd suggest trying to get it in by the time the application period begins (March 24).

Also, remember that I'm doing these reviews in my spare time, so turn around time will likely not be fast. The sooner you get it in, the more likely I'll have a review for you in time to inform your application. I can't guarantee that you'll get a review if it's close to the submission deadline.

jdesrosiers avatar Mar 06 '25 20:03 jdesrosiers

Hello @jdesrosiers, Should we implement both the Detailed and Basic output formats, or just one of them?

arpitkuriyal avatar Mar 08 '25 18:03 arpitkuriyal

Should we implement both the Detailed and Basic output formats, or just one of them?

Just one. The instructions are "either Basic or Detailed".

jdesrosiers avatar Mar 08 '25 23:03 jdesrosiers

Hello! I’m interested in working on this issue and have some experience handling JSON schema validation in similar projects. To make sure I’m aligned with the project’s style, could you clarify if there’s a preferred way to handle nested schema validation errors? Happy to submit a draft PR once I have a bit more context.

pradip-1111 avatar Mar 10 '25 18:03 pradip-1111

IMPORTANT

Everyone, please re-read the qualification task description. I've updated it with a clarification of the requirements.

jdesrosiers avatar Mar 11 '25 21:03 jdesrosiers

@pradip-1111

To make sure I’m aligned with the project’s style, could you clarify if there’s a preferred way to handle nested schema validation errors?

Are you talking about the qualification task or the project?

For the qualification task, you should be using the official output format. How things are nested depends on which format you choose. You'll find what you need to know in the JSON Schema specification.

For the project, it's your task to analyze, discuss, and propose a way to present nested errors that you think is best. The projects I linked to in the project description could give you some inspiration, but I expect you to tell me the result of your analysis in your GSoC application.

jdesrosiers avatar Mar 11 '25 21:03 jdesrosiers

@idanidan29 I'm excited you're interested in this project. I believe the focus should be on application ( proposal + qualification task) deadline which is April 8. However, it's better to submit earlier so that the mentor can have time review your task before you start writing your proposal.

Honyii avatar Mar 12 '25 07:03 Honyii