redocly-cli icon indicating copy to clipboard operation
redocly-cli copied to clipboard

Consider upgrading to a conformant, maintained, JSON Schema implementation

Open jeremyfiel opened this issue 1 year ago • 8 comments

ajv has long been the "go-to" json schema validation implementation but more recently it seems the maintainer has focused his efforts more so on his own variation of JSON validation, JTD (JSON Type Defition, https://datatracker.ietf.org/doc/rfc8927/)

There have been few updates to the JSON Schema validation capability since 2021 and more importantly, the support for 2020-12 is still lacking functionality. This capability is vital to the development of OpenAPI 3.1.x+ tooling like Redocly. One glaring bug has been open for two years which is directly related to how the OAS 3.1.x schema is written. ajv doesn't support validation of the OAS metaschema; it fails to resolve $dynamicRef. https://github.com/ajv-validator/ajv/issues/1573

JSON Schema has recently written an implementation test suite to verify the capability of different tools in respect to the specification. You can find more information about Bowtie, here. ajv, by far, is the worst offender of the test suite.

I'm not here to dictate which implementation should be used, other than recommending a validator that conforms closer to 100% of the JSON Schema test suite.

jeremyfiel avatar Aug 14 '23 19:08 jeremyfiel

Hi @jeremyfiel,

Thanks for the idea. However, we use our fork of AJV with various fixes and improvements. Also, AJV is considered to be the fastest JSONSchema validator out there, so I don't anticipate we'll switch to something else in the nearest future.

tatomyr avatar Aug 15 '23 13:08 tatomyr

Hi @tatomyr

comparing your branch with ajv, it doesn't appear you have fixed the dynamic referencing bug which I linked. Speed is great but if it doesn't conform to the specification, that's not a great case for using it for future development, especially when it's not supporting the latest version of the thing your tool is supposed to support, OAS 3.1.x.

it would be interesting to run your own implementation against the bowtie project to have a good baseline on your fork if you want to continue using it. https://docs.bowtie.report/en/stable/implementers/

jeremyfiel avatar Aug 15 '23 14:08 jeremyfiel

I would be interested in seeing the evidence that AJV is the fastest JSONSchema validator out there.

mjperrone avatar Aug 15 '23 15:08 mjperrone

Thanks for raising this and chiming in with these suggestions, it's truly valuable! I'll be honest that this would be a major overhaul to switch the library, and it's not currently on our roadmap so I'm not making any promises! That said, you're correct that we want to keep an eye on the longer term and to remain open to changing things during the (hopefully) long lifetime of this tool. I'm leaving this issue open for us to keep the discussion running, hear what alternatives are available, and what issues people do encounter.

lornajane avatar Aug 17 '23 13:08 lornajane

Hey @jeremyfiel! Thanks for raising it.

The Bowtie project has only one good javascript tool listed which is https://github.com/hyperjump-io/json-schema. It looks pretty feature-complete but not as good documented as ajv and also it's a pretty new project so not as battle-tested. I would definitely evaluate though it to understand it more.

Right now I'm leaning towards trying to fix any real issues our community has with ajv (like the $dynamicRef one). While the ajv may fail for many edge cases it doesn't mean all of those edge-cases are practical.

We'll look into this $dynamicRef one and ideally we will contribute the fix back to ajv or we can just keep it in our fork.

RomanHotsiy avatar Aug 17 '23 13:08 RomanHotsiy

I'm the author of @hyperjump/json-schema and I work full-time on JSON Schema. I can provide a few insights.

Compatibility

The results you see in Bowtie are for ajv in it's default mode, which is it's proprietary (and problematic) "strict mode". If you turn strict mode off, ajv's compatibility is good enough with the exception of the $dynamicRef bug, which there appears to be no intention of ever fixing. Since you already have experience modifying ajv since you maintain a fork, I would highly encourage you take a stab at fixing the $dynamicRef bug. A lot of people would thank you and there was even a bounty offered. Based on what is already supported of $dynamicRef, I think the hard part should be done. I don't think it would take much to get it over the finish line if you're already familiar with the ajv code.

Speed

The differences in speed between any of the implementations you might consider would be unnoticeable for an application like this. The only case where it would make sense to optimize for speed is something like an API server that processes large volumes of requests. At some value of "large", the tiny differences could add up to something that could end up saving you money by choosing a faster implementation. For a cli tool, the speed differences are irrelevant.

What's fastest will depend very much on your use case. Do you have large schemas or small schemas? Do you validate many instances against the same schema or just a few? All of these things and more will be factors. The only way to know what's fastest for you is to try it in your application and measure. Ajv excels when you run a very large amount of instances against a single schema. It gets it's speed by compiling the schema to a very fast function and uses that function to validate instances. However, the compile step is quite slow, so if you don't validate against a lot of instances, you don't get the speed benefit. Last I checked, @hyperjump/json-schema was faster than ajv at a single validation, but ajv catches up and surpasses quickly when many validations are performed.

I don't think speed should be relevant in your situation, but since it was asked, I believe the fastest implementation to be @exodus/schemasafe. It's also fully spec compliant last time I checked. However, it lags in extensibility behind ajv and especially @hyperjump/json-schema.

Hyperjump - JSON Schema

The @hyperjump/json-schema implementation has a lot of benefits, but also some gaps that you might not expect.

Benefits

Since I work full-time on JSON Schema, you can expect this implementation to be always up-to-date and for bugs (or questions) to be addressed promptly. I use this implementation to evaluate new features being considered to be added to the JSON Schema spec, so there's no lag between new features being introduced and those features being available to users.

The implementation works in the browser and bundles efficiently. On the server it has support for TypeScript as well as JavaScript.

Support for validating OpenAPI documents out of the box.

Includes an implementation of the official bundling process. This is the only such tool I know of that exists. For now, it doesn't bundle OpenAPI documents, only JSON Schemas, but it wouldn't be too hard to add OpenAPI support.

Includes tools for working with annotations. This is the only implementation of this kind available anywhere.

The implementation is designed for extensibility. If you need a custom keyword or even to work with schemas to do something other than validation, there are tools included to help you do that.

Gaps

The biggest thing that's missing that you'll want to be aware of is human readable output. This implementation uses the official output format, but doesn't include human readable messages in that output. The user will see what value failed against what keyword, but they'll have to look at the schema and instance to see why. For example, if an object fails the required keyword, it wouldn't explicitly say which required property is missing.

The other thing you might not expect is that the optional format-assertion vocabulary isn't implemented. That means that the format keyword will behave only as an annotation and not as a validator. In most implementations, there's a configuration option you can set that enables validating format. This functionality could be added as a plugin.

This implementation is ESM only, which can be a problem for packages that are still using CommonJS.

jdesrosiers avatar Aug 23 '23 20:08 jdesrosiers

Huge thanks to @jdesrosiers for this comprehensive overview! I don't have any work scheduled on this but the extra context is very much appreciated over here. Thank you!

lornajane avatar Sep 01 '23 16:09 lornajane

Thanks @jdesrosiers!

Great work on the @hyperjump/json-schema, btw! We'll definitely keep an eye on it!

RomanHotsiy avatar Sep 03 '23 03:09 RomanHotsiy