fast-json-stringify icon indicating copy to clipboard operation
fast-json-stringify copied to clipboard

optimize enums with a single value

Open bmeck opened this issue 2 years ago • 7 comments

Checklist

Before and after log:

bmeck@Bradleys-MBP fast-json-stringify % npm run bench # before

> [email protected] bench
> node ./benchmark/bench.js # before

single enum string....................................... x 33,928,717 ops/sec ±0.32% (196 runs sampled)
const string............................................. x 1,034,264,612 ops/sec ±0.15% (196 runs sampled)
^C
bmeck@Bradleys-MBP fast-json-stringify % npm run bench # after

> [email protected] bench
> node ./benchmark/bench.js # after

single enum string....................................... x 1,036,391,465 ops/sec ±0.10% (196 runs sampled)
const string............................................. x 1,031,954,256 ops/sec ±0.24% (195 runs sampled)
^C

bmeck avatar Nov 11 '23 20:11 bmeck

You can read this thread for more details. https://github.com/fastify/fast-json-stringify/discussions/532#discussioncomment-3781347

ivan-tymoshenko avatar Nov 11 '23 21:11 ivan-tymoshenko

@ivan-tymoshenko this seems a bit odd since you don't do similar checks for things like const, is there a clear difference here with const? I'm not sure how to sanely validate against a single value at a glance in the code here. lib/schema-validator doesn't seem to have a clear way to check against a deep property. I'm fine with this staying blocked however since I was just driving by and was surprised at a massive sudden drop in speed on me poking with different schemas.

bmeck avatar Nov 11 '23 22:11 bmeck

The problem is how the serialization function would behave in case when it receives a value that is not in the enum.

const schema = { type: 'string', enum: ['foo', 'bar'] }
const serialize = fjs(schema)
serialize('baz') // should return baz IIRC
const schema = { type: 'string', enum: ['foo'] }
const serialize = fjs(schema)
serialize('baz') // will return 'foo'

IMO this is confusing. I agree that situation with const looks very close. Unfortunatelly this library doesn't have a crear answer to this question. You can read the thread I sent you. It's exactly about this problem.

You still will have some performance boost if you add a sanity check.

ivan-tymoshenko avatar Nov 11 '23 22:11 ivan-tymoshenko

My few recomendations on adding enum support:

  • I would add it for strings only
  • if received value is in an enum, it should insert it, if value is not in the enum it should sanitize received string (default bahaviour)

This will give you the performance boost, because you will avoid the string sanitizing in a hot path.

ivan-tymoshenko avatar Nov 11 '23 22:11 ivan-tymoshenko

@ivan-tymoshenko I did read that thread but it didn't enlighten me; per https://json-schema.org/draft/2020-12/json-schema-validation#section-6.1.3

6.1.3. const The value of this keyword MAY be of any type, including null.

Use of this keyword is functionally equivalent to an "enum" (Section 6.1.2) with a single value.

bmeck avatar Nov 11 '23 22:11 bmeck

Can you tell me what this library should do if it receives a value that is not in the enum?

ivan-tymoshenko avatar Nov 11 '23 22:11 ivan-tymoshenko

@ivan-tymoshenko Per the specification, having a value outside of the single value of the enum means that the value passed in isn't valid and is seemingly subject to the README:

While the schema is currently validated for any developer errors, there is no guarantee that supplying user-generated schema could not expose your application to remote attacks.

Users are responsible for sending trusted data. fast-json-stringify guarantees that you will get a valid output only if your input matches the schema or can be coerced to the schema. If your input doesn't match the schema, you will get undefined behavior.

Additionally from that thread linked:

fast-json-stringify guarantees that you will get a valid output only if your input matches the schema or can be coerced to the schema. If your input doesn't match the schema, you will get undefined behavior.

So sending values outside of those explicit values (including null) would be invalid per the schema provided. So the suggestion above of:

if received value is in an enum, it should insert it, if value is not in the enum it should sanitize received string (default bahaviour)

Would actually produce an invalid value for the schema and should not be done.

Since these are values are outside of valid values for the schema it seems likely to be up to implementer desires according to the README and thread linked; I don't think checking the value is needed given these points existing.

I'm not a maintainer so it is hard to say what is desired by maintainers, but that the spec call out of equivalence isn't valid here is surprising. My only comment is that likely const and single value enum should properly align. Personally, I'd rather expect every value outside of the schema to simply be allowed undefined behavior. The README however does list a few cases where things are explicitly validated and thrown and I'm less clear on motivations for those so once again I would struggle to say when things should be checked. Having only enum check while const does not invalidates none of the README guarantees.

Calling out and validating enums is certainly possible to hot path outside of single values with some steps like you suggest above but seems out of my general cause of making this PR which was one of surprise.

bmeck avatar Nov 12 '23 03:11 bmeck