postgres-json-schema
postgres-json-schema copied to clipboard
Provide partial support for string "format" constraint
Provide partial support for string "format" constraint
Formats that correspond to native PostgreSQL data types are implemented. These are
date-time, date, time, duration, uuid, ipv4, ipv6, regex
Email format validation by a regex suggested by pbaumard.
Consistent with current behaviour unsupported options validate positive.
"email" format validation may be added with one extra line in the case list that uses the popular regular expression:
WHEN 'email' THEN IF target !~* '^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$' THEN RAISE; END IF;
It would not be as strict as the rest though. The regular expression might be brushed up too.
Nice work. I've thought about adding this myself and can see it might be useful (especially for uuid).
Have you performance tested this though? Something in the back of my head thinks that adding exception handling adds a bit of overhead.
Fair enough, exception handling does add some overhead. Yet given the relative complexity of regular expressions and ISO8601 validation (620+ timezone names and abbreviations) I think that this is a price worth paying. The performance loss is not that significant. Benchmark on a very modest laptop - 100K {"type":"number"} validate in 11.4 s, 100K {"type":"string", "format":"date-time"} validate in 12.9 s.
Thanks for checking. What are the with exception handling and without exception handling benchmark times?
It's worth mentioning because people might validate large amounts of data (if triggers run on updates for example) which might have unintended consequences based on the current version.
"email" format validation may be added with one extra line in the
caselist that uses the popular regular expression:WHEN 'email' THEN IF target !~* '^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$' THEN RAISE; END IF;It would not be as strict as the rest though. The regular expression might be brushed up too.
Using the regexp in https://dba.stackexchange.com/a/165923 following HTML5 email spec would surely be better:
WHEN 'email' THEN IF target !~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$' THEN RAISE; END IF;
it does not follow RFC 5321, section 4.1.2 specified in json-schema format, but as written in HTML5 email spec:
This requirement is a willful violation of RFC 5322, which defines a syntax for email addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.
Validating 100K {"type":"number"} with commented "format" section (i.e. the original function with no exception handling code) took the same 11.3 s. There is no performance change unless {"type":"string", "format":"date-time"} is hit and the exception machinery gets invoked.
"email" format validation may be added with one extra line in the
caselist that uses the popular regular expression:WHEN 'email' THEN IF target !~* '^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$' THEN RAISE; END IF;It would not be as strict as the rest though. The regular expression might be brushed up too.
Using the regexp in https://dba.stackexchange.com/a/165923 following HTML5 email spec would surely be better:
WHEN 'email' THEN IF target !~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$' THEN RAISE; END IF;it does not follow RFC 5321, section 4.1.2 specified in json-schema format, but as written in HTML5 email spec:
This requirement is a willful violation of RFC 5322, which defines a syntax for email addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.
Nice & clean. I have added it to the PR, quoting your comment/suggestion.