lisk-sdk icon indicating copy to clipboard operation
lisk-sdk copied to clipboard

Distinguish schema for validation and encoding/decoding

Open mehmetegemen opened this issue 2 years ago • 7 comments

Description

Currently our schemas for both encoding/decoding and validation are existing in the same domain with relative resource paths.

Motivation

If we don't distinguish between schemas that abide our codec standard and schemas that abide JSON Schema standard, we break the constraint of our encode/decode schemas and all schemas' processing logic will be constrained by the superset JSON Schema standard. This may cause hardship for the extending the codebase in the future if we want to add processing logic depending on the encoding/decoding or validation context. Therefore we should distinguish those 2 schema types.

mehmetegemen avatar May 18 '22 13:05 mehmetegemen

Maybe, in the other way around, I think it would be better to not to distinguish the schema and unity the definition? For example, maybe there is a way to use codec standard schema for validating the JSON as well?

shuse2 avatar May 18 '22 14:05 shuse2

We can create a subset of JSON Schema that is defining our entire data structure space. The important question here is whether validation space be merged with encoding space. Either you can have separate schemas and extend them, or you can add context specific markers into child objects inside unified schemas. Having a unified approach costs less since we have single schema everywhere in our codebase, it's just less flexible. If we define the boundaries of flexibility we want from a schema then we can go with unified way, meaning this requires research. Unification or segregation, both are costly in different avenues. This issue requires intricate meditation, maybe we should invite research to our idea.

mehmetegemen avatar May 18 '22 14:05 mehmetegemen

Actually I think the same schema can be used for both validation and codec. Main problem is in JS instance

{
  nonce: 10n
}

and in JSON format it is

{
  nonce: '10'
}

and we need to be able to specify which one is the intended format at that point. If that's possible, maybe we can use one schema for everything

shuse2 avatar May 18 '22 14:05 shuse2

My concern is not convertibility, my concern is uncontrolled extensibility and leak of logic from isolated modules to schemas with operators like "anyOf", "oneOf" (https://json-schema.org/understanding-json-schema/reference/combining.html) etc. if we allow everything from JSON Schema standard since we use AJV.

mehmetegemen avatar May 18 '22 14:05 mehmetegemen

I didn't try it though 😆

mehmetegemen avatar May 18 '22 14:05 mehmetegemen

Another concern is we collect all schemas in same compiled schemas property of codec. So like you said, if we use a validation schema for encode/decode, we pollute the compiled schemas of codec. At a first glance, for our simple usage and your simple example we can use converter function inside .validate() and .encode()/.decode() so lisk-validator and lisk-codec would behave differently. But using a single schema for two purposes mean constraining schema to subset because you cannot make a lossless conversion from JSON Schema superset to our codec subset, think of it like converting BigInt to Number, you can do it for smaller numbers but you cannot do it for the numbers that are greater than the finite set of 2^53 numbers. First thought coming to my mind is having markers inside the schema, default behavior for codec subset and extended behavior for superset JSON Schema. Second thought coming to my mind is extending schema for validation, but this requires an extension property, so it's not a unified schema. Third thought coming to my mind is creating a shared set for validation and encoding/decoding, and this is costly. So there are tradeoffs between options.

mehmetegemen avatar May 19 '22 06:05 mehmetegemen

How it's currently working is codec only make small assumption on the schema.

  • Format is JSON schema thus we can walk through properties or items depending on type
  • It has dataType for encodable/decodable types

All other properties are ignored and not supported. I think below are your ideas, but im not sure what each one exactly means. (it looks quite similar)

  • having markers inside the schema, default behavior for codec subset and extended behavior for superset JSON Schema
  • extending schema for validation, but this requires an extension property, so it's not a unified schema
  • creating a shared set for validation and encoding/decoding, and this is costly. So there are tradeoffs between options.

Some random idea is

validator.validateJSON(codecSchema, object)

and internally convert codec schema to JSON schema and use it for validation, or add code to ajv that it understand dataType and how to validate

shuse2 avatar May 19 '22 06:05 shuse2

Closing this issue for no activity

shuse2 avatar Dec 20 '23 16:12 shuse2