govalidator
govalidator copied to clipboard
Email validation regex too strict
The long and complicated regex you are presently using to validate email addresses is way too strict according to the RFC for email addresses.
Below is a list of valid email address, of which your regex matches less than half.
[email protected]
[email protected]
[email protected]
[email protected]
" "@example.com
“Abc\@def”@example.com
“Fred Bloggs”@example.com
“Joe\\Blow”@example.com
“Abc@def”@example.com
customer/[email protected]
[email protected]
!def!xyz%[email protected]
[email protected]
much.“more\ unusual”@example.com
very.unusual.“@”[email protected]
very.“(),:;<>[]”.VERY.“very@\\ "very”[email protected]
!#$%&'*+-/=?^_`{}|[email protected]
Miles.O'[email protected]
postmaster@☁→❄→☃→☀→☺→☂→☹→✝.ws
allen@[127.0.0.1]
allen@[IPv6:0:0:1]
root@localhost
john@com
I'd recommend changing the regex to just this: ^.+@.+$
I'm against changing the regex to this oversimplistic one – you may add your own struct validation tag or your own custom validation function.
rxEmailLax = regexp.MustCompile("^.+@.+$")
govalidator.TagMap["email_lax"] = govalidator.Validator(func(str string) bool {
return rxEmailLax.MatchString(str)
})
Fair enough. What about improving the current regex to cover more of the above examples?
Let me ask you this: what real-world email address that you currently need to handle failed the validator? The list above is, to be frank, a bit weird.
Looking at them, here's the list with checkboxes – unchecked means it fails validation:
- [x] [email protected]
- [x] [email protected]
- [ ] [email protected]
- [x] [email protected]
- [ ] " "@example.com
- [ ] “Abc@def”@example.com
- [ ] “Fred Bloggs”@example.com
- [ ] “Joe\Blow”@example.com
- [ ] “Abc@def”@example.com
- [x] customer/[email protected]
- [x] [email protected]
- [x] !def!xyz%[email protected]
- [x] [email protected]
- [ ] much.“more\ unusual”@example.com
- [ ] very.unusual.“@”[email protected]
- [ ] very.“(),:;<>[]”.VERY.“very@\ "very”[email protected]
- [x] !#$%&'*+-/=?^_`{}|[email protected]
- [x] Miles.O'[email protected]
- [x] postmaster@☁→❄→☃→☀→☺→☂→☹→✝.ws
- [ ] allen@[127.0.0.1]
- [ ] allen@[IPv6:0:0:1]
- [ ] root@localhost
- [ ] john@com
Which one of those unchecked would you like fixed first? Looking at the regex I see that it's a monster. Will try to check for an alternative as well.
Interesting. GitHub tries to autolink these addresses and the validator succeeds in a few where GitHub does not link them as one email address (just parts of it). All of the unchecked ones are not recognized by GitHub either.
I guess this sort of depends on how much more complicated you want the regex to get... You could probably handle the [email protected] address without much difficulty, but everything else looks quite tough to handle. And whether you want to handle the last 4 (the IP based and the domains lacking a dot), might be debatable for some of your users.
You're right. To be honest, I have never seen or met someone with an email address looking like the ones above that are unchecked. Emojis, backslashes, … – who remembers this stuff? :D
Oh, one jumped out at me from the corporate world: “Fred Bloggs”@example.com. These I have seen once.
Waiting on @asaskevich to chime in.
Better to follow RFC 😄 But I'm not sure that this regex will validate valid (as it said in RFC) and invalid (according to RFC) emails.
@asaskevich any reason why mail.ParseAddress
isn't being used?
Instead it would be better to use the regexp provided in the HTML5 specification for e-mail input fields: https://www.w3.org/TR/html5/forms.html#valid-e-mail-address The HTML5 spec explains why the specification RFC5322 is broken and should not be followed.
@epelc Is "Alice <[email protected]>"
really something you want to accept in a field for an e-mail address?
IsEmail
is a matcher. Not an extractor like mail.ParseAddress
.
@dolmen I was thinking you could get the Address
field. It's definitely not ideal though.
I wasn't aware html5 had a different spec for email formats.
Using regular expressions to parse email addresses
https://drewdevault.com/2017/08/13/When-not-to-use-a-regex.html
.mil addresses are not recognized. I too suggest a simpler regex for this. I like the advice on this site: https://www.regular-expressions.info/email.html though the HTML definition mentioned above also looks good.
I see the email validation was updated sometime after this thread. Was there a standard settled for? Currently, it marks entries such as "]"@bar.com
,")"@bar.com
, &@bar.com
, "@bar.com
, '@bar.com.
Single character domains also seem to fail, like [email protected], while [email protected] is ok.
Maybe use the regex used by HTML5 browsers to validate type=email inputs: https://www.w3.org/TR/html5/forms.html#valid-e-mail-address
We had a problem with email addresses that had a three component domain, e.g. [email protected]. Solved by switching to HTML5 regex.
Hello guys! I forked this package cause owner disappeared. Hope, he will be back, but it would be easier to merge these changes back if he is back Link to my repo: create issue there and we'll discuss it.
@sergeyglazyrindev maybe link to your repo?
hi @ptman here it is: https://github.com/sergeyglazyrindev/govalidator
Also, I get does not validate as email
error when use gmail-style emails with plus sign (e.g. [email protected]
)