malli
malli copied to clipboard
Add `coercer`
Something like in Plumatic Schema. This would do the job of both validator
and encoder
/decoder
(and perhaps also explainer
if that can be incorporated efficiently) in one pass. Having to first run validator
and then encoder
/decoder
(or explainer
) is not optimal and a bit of a footgun IMHO.
This could also allow transcoder ("coercer"?) functions that can fail instead of passing invalid inputs through unchanged.
Having a single-sweep might not be faster unless it's implemented so that the generated methods remain small, e.g. using some trampoline / loop. JIT compiler will compile small methods into native code - the current separate phases (validation, decoding & explaining) are all small and separate.
Dummy testing with Plumatiic & Malli:
(require '[schema.core :as s])
(require '[schema.coerce :as sc])
(require '[malli.core :as m])
(require '[malli.transform :as mt])
(require '[criterium.core :as cc])
(defn json-schema-coercer [schema]
(sc/coercer schema sc/json-coercion-matcher))
(defn json-malli-coercer [schema]
(let [decode (m/decoder schema (mt/json-transformer))
validate (m/validator schema)
explain (m/explainer schema)]
(fn [x] (let [v (decode x)] (if (validate v) v (explain v))))))
(defn bench! [coerce value]
(prn (coerce value))
(cc/quick-bench (coerce value))
(println))
(let [valid "kukka"
invalid 1]
;; Plumatic
(let [coerce (json-schema-coercer s/Keyword)]
;; 35ns
(bench! coerce valid)
;; 33ns
(bench! coerce invalid))
;; Malli
(let [coerce (json-malli-coercer :keyword)]
;; 34ns
(bench! coerce valid)
;; 121ns
(bench! coerce invalid)))
(let [valid {:name "kikka"
:address {:street "haavikontie", :zip 33800}
:tags ["kikka" "kukka"]}
invalid {:name "kikka"
:address {:street "haavikontie", :zip 33800}
:tags ["kikka" "kukka" false]}]
;; Plumatic
(let [coerce (json-schema-coercer {:name s/Str
:address {:street s/Str
:zip s/Int}
:tags #{s/Keyword}})]
;; 4.4µs
(bench! coerce valid)
;; 7.3µs
(bench! coerce invalid))
;; Malli
(let [coerce (json-malli-coercer [:map
[:name :string]
[:address [:map
[:street :string]
[:zip :int]]]
[:tags [:set :keyword]]])]
;; 1.8µs
(bench! coerce valid)
;; 2.8µs
(bench! coerce invalid)))
That is not how JITs work. Smaller methods are more likely to get inlined, but larger ones will be compiled too.
And the combined methods would be smaller than the sum of the separate ones anyway, so quite small.
I believe both the call depth and the (bytecode) size matter how/if JIT works for the code. Having three (middleware-)function chains of depth 6 vs one of depth 18 perform differently as the latter blows the inlining depth budgets (and most likely the size too). An interceptor executor running 18 independent functions would also behave differently, might be slower, or not. More work to do (for the loop / executor) but small functions and lower call stacks. There is an :analyze
profile in reitit, got good results by checking the hot code fitted into the default budgets (of the JVMs of those days).
Didn't check the Schema source how it combines the transform + validation + explain into a single sweep, but with the dummy test setup, it was not faster - actually, seems slower.
parser
is basically validator
+ decoder
with hardcoded decoders so there is no reason to believe coercer
would be much more bulky or indirect than parser
or decoder
. There is also no reason to believe compiler optimizations would be more effective than doing fewer passes over the data -- blowing the inlining budget just means less optimizations and more call overhead, not that we are stuck in interpreter mode forever.
I think explain
would be better kept as a separate pass for the erroneous slow path since it must allocate the errors and plumb them around.
Schema is probably all-round slower for unrelated reasons.
m/coercer