json-kotlin-schema
json-kotlin-schema copied to clipboard
Validation of email address with non ASCII character fails
Hi,
The json that I am trying to validate has a non ASCII character in the email field e.g. the Spanish letter "ñ" and consequently validation fails with the following error:
A subschema had errors - #/email
Value fails format check "email", was "mu\[email protected]" - #/email
Here is the test code:
fun test() {
val schemaString = """
{
"${'$'}schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"email": {
"type": "string",
"format": "email"
}
},
"required": ["email"]
}
""".trimIndent()
val jsonString = """
{
"email": "muñ[email protected]"
}
""".trimIndent()
val schema = JSONSchema.parse(schemaString)
val output = schema.validateBasic(jsonString)
require(output.errors == null) {
output.errors?.forEach {
println("${it.error} - ${it.instanceLocation}")
}
"Json schema validation failed."
}
}
Is there a way around this?
Hi, thanks for the message.
In implementing this library I have attempted to follow strictly the JSON Schema specification, which says (JSON Schema Validation, section 7.3.2:
email: As defined by the "Mailbox" ABNF rule in RFC 5321, section 4.1.2
And RFC 5321, section 4.1.2 contains the following ABNF rules:
Mailbox = Local-part "@" ( Domain / address-literal )
Local-part = Dot-string / Quoted-string
Dot-string = Atom *("." Atom)
Atom = 1*atext
atext
is defined in RFC5322 section 3.2.3 as being the ASCII alphabetic and numeric characters, plus the following ASCII special characters:
! # $ % & ' * + - / = ? ^ _ ` { | } ~
The Quoted-string
rule allows any combination of ASCII characters within double quotes, but even that does not allow characters above hex 7E. In fact, the specification goes on to say:
Systems MUST NOT define mailboxes in such a way as to require the use in SMTP of non-ASCII characters (octets with the high order bit set to one) or ASCII "control characters" (decimal value 0-31 and 127).
I realise that in practice, many mail systems may ignore these rules and allow non-ASCII characters in mail addresses, but I feel that as an implementer of JSON Schema I have no option but to follow the specification as closely as possible.
All this explanation doesn't help in your case, but you might like to try a pattern
validation – the emailregex web site contains a number of suggestions (the form of Regex used by the library is of course the Java form).
I may consider allowing pluggable implementations of the format validations in a later version of the library, but I can't give you a timeline for that.
Sorry I can't be more help,
-Peter Wall
Thanks, that was very informative. I will look into other options.