multipleOf less than 1 doesn't validate with `json_matches_schema`
Bug report
- [X] I confirm this is a bug with Supabase, not with my own application.
- [X] I confirm I have searched the Docs, GitHub Discussions, and Discord.
Describe the bug
Number schemas with a multipleOf less than 1 don't validate. For example, 17.2 should be valid against the schema {"type": "number","multipleOf": 0.1}, but json_matches_schema returns false.
I have verified the expected behavior using jsonschemavalidator.net.
To Reproduce
Steps to reproduce the behavior, please provide code snippets or a repository:
Run the following in a Postgres database with the extension installed:
SELECT json_matches_schema(
'{"type": "number","multipleOf": 0.1}',
'17.2'
);
The above statement returns false.
Expected behavior
The above statement should return true.
Screenshots
N/A
System information
- PostgreSQL: v16 (Neon)
Additional context
N/A
This is a bug in the jsonschema crate used by pg_jsonschema. Specifically, this function first uses float types to find out if a float value is a multiple of another and only if the result is a NaN value it falls back to fraction::BigFraction, apparantly to save allocations. The float calculations don't always work. E.g. in your specific example values of 17.2 and 0.1 the result of division is 17.2 / 0.1 = 171.99999999999997. This value's mod with 0.1 is 171.99999999999997 % 1 = 0.9999999999999716 which is much larger than f64::EPSILON. The fix is to probably always use BigFraction for these calculations.
@johncmacy would you like to open a bug in the jsonschema repo? @Stranger6667 should be able to help you.
The JSON Schema spec defines multipleOf like this:
The value of "multipleOf" MUST be a number, strictly greater than 0. A numeric instance is valid only if division by this keyword's value results in an integer.
And validation of numeric instances:
The JSON specification allows numbers with arbitrary precision, and JSON Schema does not add any such bounds. This means that numeric instances processed by JSON Schema can be arbitrarily large and/or have an arbitrarily long decimal part, regardless of the ability of the underlying programming language to deal with such data.
However, the spec does not require specific numeric representations (e.g., IEEE754 vs. decimal). This creates issues with f64 due to its inherent floating-point precision issues. For instance, 17.2 / 0.1 = 171.99999999999997. Since the result is not an exact integer, this would fail multipleOf validation, even though mathematically 17.2 is a multiple of 0.1. Decimal arithmetic avoids this issue by maintaining exact precision during division.
The jsonschema crate defaults to f64 because it relies on the serde_json parser, which uses f64 for number parsing by default. While serde_json has the arbitrary_precision feature to support exact numeric types (it stores String internally), this feature must be enabled at compile-time and is not enabled by default.
In any event, I think that most of the issues related to multipleOf in the jsonschema crate (there are multiple of them already + this one) come from different expectations about the data type - many expect it to behave like decimals but they are double-precision floats. To make the situation better I believe the jsonschema crate can potentially have:
- better documentation about number representation
- feature to switch between
f64& decimal
Right now, there is a WIP PR that clarifies the matter. See also this issue.
feature to switch between f64 & decimal
If jsonschehma exposes a feature flag then pg_jsonschema will enable it to fix this issue.
Thank you both for the explanation. I understand better why it's a problem, but I still think this is a bug that needs to be fixed, not simply a need for better documentation. It sounds like there already are efforts to fix this - do you need me to create an issue, or is that redundant?
FWIW, I'd put my vote in for this package (pg_jsonschema, the Postgres extension) to default to the decimal implementation. IMHO, that would result in the least amount of surprises.
It sounds like there already are efforts to fix this - do you need me to create an issue, or is that redundant?
I think this issue is a good place to track the progress, hence no new issues are needed right now. I plan to consolidate all related information there + clarify a few moments with the JSON Schema folks, then will work on the implementation.
Thanks, much appreciated!
In the recent versions there is the arbitrary-precision flag in jsonschema. Enabling it here should solve the issue (with some performance hit)