govalidator icon indicating copy to clipboard operation
govalidator copied to clipboard

Email validation regex too strict

Open veqryn opened this issue 8 years ago • 21 comments

The long and complicated regex you are presently using to validate email addresses is way too strict according to the RFC for email addresses.

Below is a list of valid email address, of which your regex matches less than half.

[email protected]
[email protected]
[email protected]
[email protected]
" "@example.com
“Abc\@def”@example.com
“Fred Bloggs”@example.com
“Joe\\Blow”@example.com
“Abc@def”@example.com
customer/[email protected]
[email protected]
!def!xyz%[email protected]
[email protected]
much.“more\ unusual”@example.com
very.unusual.“@”[email protected]
very.“(),:;<>[]”.VERY.“very@\\ "very”[email protected]
!#$%&'*+-/=?^_`{}|[email protected]
Miles.O'[email protected]
postmaster@☁→❄→☃→☀→☺→☂→☹→✝.ws
allen@[127.0.0.1]
allen@[IPv6:0:0:1]
root@localhost
john@com

I'd recommend changing the regex to just this: ^.+@.+$

veqryn avatar May 18 '16 20:05 veqryn

I'm against changing the regex to this oversimplistic one – you may add your own struct validation tag or your own custom validation function.

  rxEmailLax = regexp.MustCompile("^.+@.+$")
  govalidator.TagMap["email_lax"] = govalidator.Validator(func(str string) bool {
    return rxEmailLax.MatchString(str)
  })

annismckenzie avatar May 19 '16 10:05 annismckenzie

Fair enough. What about improving the current regex to cover more of the above examples?

veqryn avatar May 19 '16 18:05 veqryn

Let me ask you this: what real-world email address that you currently need to handle failed the validator? The list above is, to be frank, a bit weird.

Looking at them, here's the list with checkboxes – unchecked means it fails validation:

Which one of those unchecked would you like fixed first? Looking at the regex I see that it's a monster. Will try to check for an alternative as well.

annismckenzie avatar May 20 '16 09:05 annismckenzie

Interesting. GitHub tries to autolink these addresses and the validator succeeds in a few where GitHub does not link them as one email address (just parts of it). All of the unchecked ones are not recognized by GitHub either.

annismckenzie avatar May 20 '16 09:05 annismckenzie

I guess this sort of depends on how much more complicated you want the regex to get... You could probably handle the [email protected] address without much difficulty, but everything else looks quite tough to handle. And whether you want to handle the last 4 (the IP based and the domains lacking a dot), might be debatable for some of your users.

veqryn avatar May 20 '16 19:05 veqryn

You're right. To be honest, I have never seen or met someone with an email address looking like the ones above that are unchecked. Emojis, backslashes, … – who remembers this stuff? :D

Oh, one jumped out at me from the corporate world: “Fred Bloggs”@example.com. These I have seen once.

Waiting on @asaskevich to chime in.

annismckenzie avatar May 20 '16 19:05 annismckenzie

Better to follow RFC 😄 But I'm not sure that this regex will validate valid (as it said in RFC) and invalid (according to RFC) emails.

asaskevich avatar Jul 15 '16 17:07 asaskevich

@asaskevich any reason why mail.ParseAddress isn't being used?

epelc avatar Jul 19 '16 15:07 epelc

Instead it would be better to use the regexp provided in the HTML5 specification for e-mail input fields: https://www.w3.org/TR/html5/forms.html#valid-e-mail-address The HTML5 spec explains why the specification RFC5322 is broken and should not be followed.

dolmen avatar Feb 03 '17 11:02 dolmen

@epelc Is "Alice <[email protected]>" really something you want to accept in a field for an e-mail address? IsEmail is a matcher. Not an extractor like mail.ParseAddress.

dolmen avatar Feb 03 '17 12:02 dolmen

@dolmen I was thinking you could get the Address field. It's definitely not ideal though.

I wasn't aware html5 had a different spec for email formats.

epelc avatar Feb 03 '17 13:02 epelc

Using regular expressions to parse email addresses

https://drewdevault.com/2017/08/13/When-not-to-use-a-regex.html

emersion avatar May 04 '18 17:05 emersion

.mil addresses are not recognized. I too suggest a simpler regex for this. I like the advice on this site: https://www.regular-expressions.info/email.html though the HTML definition mentioned above also looks good.

macrael avatar May 08 '18 21:05 macrael

I see the email validation was updated sometime after this thread. Was there a standard settled for? Currently, it marks entries such as "]"@bar.com,")"@bar.com, &@bar.com, "@bar.com, '@bar.com.

0sc avatar Oct 24 '18 09:10 0sc

Single character domains also seem to fail, like [email protected], while [email protected] is ok.

ptman avatar Nov 27 '18 13:11 ptman

Maybe use the regex used by HTML5 browsers to validate type=email inputs: https://www.w3.org/TR/html5/forms.html#valid-e-mail-address

ptman avatar Mar 13 '19 12:03 ptman

We had a problem with email addresses that had a three component domain, e.g. [email protected]. Solved by switching to HTML5 regex.

ptman avatar Mar 20 '19 06:03 ptman

Hello guys! I forked this package cause owner disappeared. Hope, he will be back, but it would be easier to merge these changes back if he is back Link to my repo: create issue there and we'll discuss it.

sergeyglazyrindev avatar Oct 17 '21 21:10 sergeyglazyrindev

@sergeyglazyrindev maybe link to your repo?

ptman avatar Oct 29 '21 11:10 ptman

hi @ptman here it is: https://github.com/sergeyglazyrindev/govalidator

sergeyglazyrindev avatar Oct 29 '21 12:10 sergeyglazyrindev

Also, I get does not validate as email error when use gmail-style emails with plus sign (e.g. [email protected])

morigs avatar Oct 25 '22 13:10 morigs