malli
malli copied to clipboard
re schema doesn't properly validate values with line breaks
(malli.core/validate [:re #"^thing$"] "thing\n")
=> true
Compare to
(re-matches #"^thing$" "thing\n")
=> nil
This ultimately boils down to the difference in java.util.regex.Matcher methods find and matches (clojure.core/re-findvs. clojure.core/re-matches).
(let [pattern (Pattern/compile "^thing$")
matcher (.matcher pattern "thing\n")]
(.matches matcher))
=> false
(let [pattern (Pattern/compile "^thing$")
matcher (.matcher pattern "thing\n")]
(.find matcher))
=> true
The javadoc on java.util.regex.Matcher/find states:
* @return {@code true} if, and only if, a subsequence of the input
* sequence matches this matcher's pattern
whereas the javadoc for java.util.regex.Matcher/matches states:
Attempts to match the entire region against the pattern.
To me, at least, the functionality of java.util.regex.Matcher/matches, i.e clojure.core/re-matches, is what I'm expecting.
Interesting. Tested with few alternatives:
Plumatic
(require '[schema.core :as schema])
(schema/validate #"thing" "thing\n")
; => "thing\n" (match)
JSON Schema Spec
{"type": "string", "pattern": "^thing$"}
// => null (match)
Javascript
RegExp('^thing$').exec('thing\n')
// => null (no match)
So, Malli works currently like Plumatic and JSON Schema do, but not like default JavaScript.
Do you have more external references (on languages / libraries etc) on what is the correct way to handle this? Options:
- keep the current way as others (JSON Schema + Plumatic) are doing it like this
- change the default as it's obviously less correct, need to handle JSON Schema mapping
- new option to
m/-re-schemato override this, e.g. users can configure this
I think the behavior for re-find is definitely a little sneaky with it allowing a newline at the end, but I think using re-find behavior is more intuitive than re-matches behavior.
For example if I have the schema
[:re #"\dcans"]
I would expect that to match any string that contained <num>cans anywhere in the string, e.g. one can, 2cans, 3cans or something like that. If I wanted my string to start with <num>cans, then I'd use ^. re-matches behavior is basically forcing ^ and $ semantics when sometimes that's not what you want. You can always opt in to that behavior by using ^ and $.
I should also mention that you can always use \z instead of $ if you actually want the end of the input, $ is technically The end of a line and \z is the end of the input.
(re-find "^string\z" "string\n") => nil
See https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html
I didn't even know about \z. I think that's good enough for me.