malli Add `coercer`

Something like in Plumatic Schema. This would do the job of both validator and encoder/decoder (and perhaps also explainer if that can be incorporated efficiently) in one pass. Having to first run validator and then encoder/decoder (or explainer) is not optimal and a bit of a footgun IMHO.

Mar 22 '21 10:03 nilern

This could also allow transcoder ("coercer"?) functions that can fail instead of passing invalid inputs through unchanged.

Mar 22 '21 10:03 nilern

Having a single-sweep might not be faster unless it's implemented so that the generated methods remain small, e.g. using some trampoline / loop. JIT compiler will compile small methods into native code - the current separate phases (validation, decoding & explaining) are all small and separate.

Dummy testing with Plumatiic & Malli:

(require '[schema.core :as s])
(require '[schema.coerce :as sc])

(require '[malli.core :as m])
(require '[malli.transform :as mt])

(require '[criterium.core :as cc])

(defn json-schema-coercer [schema]
  (sc/coercer schema sc/json-coercion-matcher))

(defn json-malli-coercer [schema]
  (let [decode (m/decoder schema (mt/json-transformer))
        validate (m/validator schema)
        explain (m/explainer schema)]
    (fn [x] (let [v (decode x)] (if (validate v) v (explain v))))))

(defn bench! [coerce value]
  (prn (coerce value))
  (cc/quick-bench (coerce value))
  (println))

(let [valid "kukka"
      invalid 1]

  ;; Plumatic
  (let [coerce (json-schema-coercer s/Keyword)]

    ;; 35ns
    (bench! coerce valid)

    ;; 33ns
    (bench! coerce invalid))

  ;; Malli
  (let [coerce (json-malli-coercer :keyword)]

    ;; 34ns
    (bench! coerce valid)

    ;; 121ns
    (bench! coerce invalid)))



(let [valid {:name "kikka"
             :address {:street "haavikontie", :zip 33800}
             :tags ["kikka" "kukka"]}
      invalid {:name "kikka"
               :address {:street "haavikontie", :zip 33800}
               :tags ["kikka" "kukka" false]}]

  ;; Plumatic
  (let [coerce (json-schema-coercer {:name s/Str
                                     :address {:street s/Str
                                               :zip s/Int}
                                     :tags #{s/Keyword}})]

    ;; 4.4µs
    (bench! coerce valid)

    ;; 7.3µs
    (bench! coerce invalid))

  ;; Malli
  (let [coerce (json-malli-coercer [:map
                                    [:name :string]
                                    [:address [:map
                                               [:street :string]
                                               [:zip :int]]]
                                    [:tags [:set :keyword]]])]

    ;; 1.8µs
    (bench! coerce valid)

    ;; 2.8µs
    (bench! coerce invalid)))

Apr 08 '21 16:04 ikitommi

That is not how JITs work. Smaller methods are more likely to get inlined, but larger ones will be compiled too.

And the combined methods would be smaller than the sum of the separate ones anyway, so quite small.

Apr 08 '21 16:04 nilern

I believe both the call depth and the (bytecode) size matter how/if JIT works for the code. Having three (middleware-)function chains of depth 6 vs one of depth 18 perform differently as the latter blows the inlining depth budgets (and most likely the size too). An interceptor executor running 18 independent functions would also behave differently, might be slower, or not. More work to do (for the loop / executor) but small functions and lower call stacks. There is an :analyze profile in reitit, got good results by checking the hot code fitted into the default budgets (of the JVMs of those days).

Didn't check the Schema source how it combines the transform + validation + explain into a single sweep, but with the dummy test setup, it was not faster - actually, seems slower.

Apr 08 '21 20:04 ikitommi

parser is basically validator + decoder with hardcoded decoders so there is no reason to believe coercer would be much more bulky or indirect than parser or decoder. There is also no reason to believe compiler optimizations would be more effective than doing fewer passes over the data -- blowing the inlining budget just means less optimizations and more call overhead, not that we are stuck in interpreter mode forever.

I think explain would be better kept as a separate pass for the erroneous slow path since it must allocate the errors and plumb them around.

Schema is probably all-round slower for unrelated reasons.

Apr 09 '21 13:04 nilern

m/coercer

Dec 09 '22 07:12 ikitommi

malli malli copied to clipboard

Add `coercer`

malli
malli copied to clipboard