RegexHub
RegexHub copied to clipboard
I'm sorry, but there's loads of issues...
HTML Tags
/^<([a-z1-6]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/
Hex Value
/^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$/
Hex HTML/CSS color value maybe, but 0xDEADBEAF
is a perfectly valid hex value.
Password
/^[a-zA-Z0-9+_-]{6,32}$/
Slowly we're moving the world to password phrases and everybody should be hashing their passwords. Then why the 32 char limit? And why, for Pete's sake, are we only allowing a-zA-Z0-9+_-
and nothing else? *cries* (see also)
/^([a-z0-9+_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,24})$/
Yeah. Just. No. Another famous answer
Positive number
/^\d*\.?\d+$/
We don't all live in the US/UK. (1,234.56
v.s. 1.234,56
)
Phonenumber
/^\+?[\d\s]{3,}$/
+123 is a valid phonenumber? Where? Phonenumbers are notoriously hard to validate (hence libphonenumber for example).
Date in format dd/mm/yyyy
/^(0?[1-9]|[12][0-9]|3[01])([ \/\-])(0?[1-9]|1[012])\2(19[0-9][0-9]|20[0-9][0-9])$/
Failed the very first 'edge case' I could come up with: 30/02/2016 but also 1852 or 2150 fail... ( as noted elsewhere).
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski
Thanks @RobThree some valid points there. Though the date pattern is matching on 30/02/2016 for me.
Regarding the HTML tag pattern, that's pretty useful for plain text HTML, like in an editor.
I've now removed the password pattern as that was proving particularly controversial.
PRs are very welcome if you want to make any improvements yourself.
Regarding the HTML tag pattern, that's pretty useful for plain text HTML, like in an editor.
<b>this html</b><b>would beg the differ</b>
Though the date pattern is matching on 30/02/2016 for me
Except that feb. 30th doesn't exist 😉
Regarding the HTML tag pattern, that's pretty useful for plain text HTML, like in an editor.
Except that there are a gazillion ways the regex will match incorrectly (demonstrated here) or cause trouble otherwise. Have you read the stackoverflow answer I linked to?
PRs are very welcome if you want to make any improvements yourself.
All the ones I pointed out are very case-specific and hard, if not impossible (html, email for example), to get correct. Though I can think of improvements here-and-there I'd suggest taking them all down; for most, if not all, of the regexes there are better ways of handling and validating the inputs (like simply parsing a date(time) value to 'validate' it or sending an activation e-mail to verify an e-mail address).
Regexes do have their use, I'm not saying they don't. But, as said, for most (if not all) of the examples there are much better solutions.
Edit: Here's more I just stumbled upon.
Re: Emails. The only true way to validate emails is with basic pattern matching. Something along the lines of looking for @.* is the most you can possibly hope to do.
I completely agree with Rob on that point.
@CSobol email pattern has now been updated with this PR: https://github.com/lukehaas/RegexHub/pull/15
It also lacks a ^ and $ for the time pattern, just like the date one, otherwise it matches "4;30" when you input "24:00"
I seem to run into a bug with the pattern document.body.innerHTML=flags//whoops
;)
For the email, several regex can help to filter some bad formats.
Lot of sites are still expecting 'simple' emails, eg. max 3 chars for TLD (.com)! The question is to know if you want a valid one or one that will work on almost on all sites.
Few filters
Maximum length: 254 due to network protocols, not email specs, search RFC... Minimum length: 7 like [email protected]
.{7,254}
Rough validation of min/max length blocks:
.{1,248}@.{2,250}\..{2,64}
Enhancing this formula, the 3 lengths, is ?impossible? in regex as you need to know the length of each part, must use javascript, not just regex.
Just for Latin char set, supposing case insensitive is set (/..../i):
[a-z][a-z0-9\._-]{0,246}[a-z0-9]@[a-z][a-z0-9\._-]{0,248}[a-z0-9]\.[a-z][a-z0-9\.]{0,61}[a-z]
(Should verify the above one)
A bit more international, but invalid characters are not filtered (spacesss, tabsss, I think controls are except DEL):
[!-\uFFEF]{1,248}@[!-\uFFEF]{2,250}\.[!-\uFFEF]{2,64}
64 is the maximum, today the maximum existing is 24 XN--VERMGENSBERATUNG-PWB http://data.iana.org/TLD/tlds-alpha-by-domain.txt
Few links
To test your suppositions: http://cobisi.com/support/kb/emailverify.net/verification-process/validation-levels
Free Mailgun.com validation api, not just the syntax: https://www.mailgun.com/email-validation
Explanation of unicode in regex: http://www.regular-expressions.info/unicode.html
For the lazy one, this one is from a framework, dont remember which one... But mailgun is ok. Apparently it respects all the rules, except it does not check the length, see above.
function is_valid_email_address(email_address) { var pattern = new RegExp(/^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_
{|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(.([a-z]|\d|[!#$%&'*+-/=?^_{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$/i);
The question is to know if you want a valid one or one that will work on almost on all sites.
That's an easy answer. When it comes to email addresses, you never want to stop a valid user from signing up via email address. You would much rather take a hundred junk email address than prevent one valid user from signing up or filling out a form.