email-rfc2822-validator icon indicating copy to clipboard operation
email-rfc2822-validator copied to clipboard

[question] Parsing problem with '=?UTF-8?Q?Gesellschaft_f=C3=BCr_Freiheitsrechte_e=2EV=2E?= <[email protected]>'

Open aanno opened this issue 4 years ago • 14 comments

I tried the library on many mail addresses without problems but =?UTF-8?Q?Gesellschaft_f=C3=BCr_Freiheitsrechte_e=2EV=2E?= <[email protected]> seems to confuse EmailAddressParser. I have no if it is valid w.r.t. rfc2822, however..

aanno avatar Mar 28 '20 09:03 aanno

That address looks valid to me... maybe @bbottema will have a comment about why it isn't working.

chconnor avatar Mar 28 '20 17:03 chconnor

I double checked and this address indeed doesn't pass the test. I have not clue why, somewhere the following awesomeballz regex fails to recognize this address as valid:

(((?:(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09
\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x
0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?[a-zA-Z0
-9!#-'*+\-/=?^-`{-~.\[]]+(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~
]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)
?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[
 \t]+)))?)|(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x0
1-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08
\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?"(
?>(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!#-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F])))*(?:(?:[ \t]*\r\n)?[
\t]+)?"(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\
x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0
C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?))(?:(?:(
?:[ \t]*\r\n)?[ \t]+)(?:(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]
-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]
+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)
?[ \t]+)))?[a-zA-Z0-9!#-'*+\-/=?^-`{-~.\[]]+(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-
\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:
[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|
(?:(?:[ \t]*\r\n)?[ \t]+)))?)|(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'
*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)
?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]
*\r\n)?[ \t]+)))?"(?>(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!#-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F])))*(
?:(?:[ \t]*\r\n)?[ \t]+)?"(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-
~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+
)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?
[ \t]+)))?)))*)??((?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\
[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-
\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+))
)?<((?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B
\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x
0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?([a-zA-Z0-9!
#-'*+\-/=?^-`{-~]+(?:\.[a-zA-Z0-9!#-'*+\-/=?^-`{-~]+)*)(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x
0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?
\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[
 \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?|(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F
\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t
]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(
?:[ \t]*\r\n)?[ \t]+)))?("(?:(?:(?:[ \t]*\r\n)?[ \t]+)?(?>[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!#-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x
7F])))*(?:(?:[ \t]*\r\n)?[ \t]+)?")(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!
-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\
n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \
t]*\r\n)?[ \t]+)))?)@(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]
|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?
[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[
\t]+)))?([a-zA-Z0-9!#-'*+\-/=?^-`{-~]+(?:\.[a-zA-Z0-9!#-'*+\-/=?^-`{-~]+)*)(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?
[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:
[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))
*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?|(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\
x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \
t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r
\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?(\[(?:(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-Z^-~]|(?:\\[\x01-\
x09\x0B\x0C\x0E-\x7F]))+)*(?:(?:[ \t]*\r\n)?[ \t]+)?])(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0
B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\
((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[
\t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?)>((?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x
7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*
\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:
[ \t]*\r\n)?[ \t]+)))?))|(((?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]
-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]
+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)
?[ \t]+)))?([a-zA-Z0-9!#-'*+\-/=?^-`{-~]+(?:\.[a-zA-Z0-9!#-'*+\-/=?^-`{-~]+)*)(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\
n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:
(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F
]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?|(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x0
1-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?
[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]
*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?("(?:(?:(?:[ \t]*\r\n)?[ \t]+)?(?>[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!#-\[\]-~]|(?:\\[\
x01-\x09\x0B\x0C\x0E-\x7F])))*(?:(?:[ \t]*\r\n)?[ \t]+)?")(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x0
8\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]
+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n
)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?)@(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x
0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:
(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\
))|(?:(?:[ \t]*\r\n)?[ \t]+)))?([a-zA-Z0-9!#-'*+\-/=?^-`{-~]+(?:\.[a-zA-Z0-9!#-'*+\-/=?^-`{-~]+)*)((?:(?:(?:[ \t]*\r\n)?[ \t]+)?\(
(?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \
t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x0
9\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?|(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*
\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:
(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\
x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?(\[(?:(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7
F!-Z^-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))+)*(?:(?:[ \t]*\r\n)?[ \t]+)?])((?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[
\t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[
\t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(
?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?))

:smiley:

bbottema avatar Mar 28 '20 20:03 bbottema

It validates with the default isValid() though, which is not strict RFC

So one of the following flags makes it fail:

  • ALLOW_DOMAIN_LITERALS
  • ALLOW_DOT_IN_A_TEXT
  • ALLOW_SQUARE_BRACKETS_IN_A_TEXT

bbottema avatar Mar 28 '20 20:03 bbottema

Ok, so ALLOW_SQUARE_BRACKETS_IN_A_TEXT breaks the validation. However, the intricate details of these regexes are beyond my knowledge. I don't understand what brackets have to do with this particular email address.

bbottema avatar Mar 28 '20 20:03 bbottema

This line has \\[], should that be \\[\\] ?

It's been ages since I was in this stuff, so just taking shots here. :-)

chconnor avatar Mar 28 '20 20:03 chconnor

Yeah I was just testing that myself, but doesn't seem to matter.

bbottema avatar Mar 28 '20 20:03 bbottema

Hmmm... seems like an uncommented ] there would really throw a wrench in... I'm surprised it didn't matter...

chconnor avatar Mar 28 '20 21:03 chconnor

Yeah, maybe the regex engine is smart enough to look for an outer bracket of character ranges.

Anyway I escaped it now and committed it as well as this particular case as a junit test, which as of now breaks the build. Perhaps you can have a look, it's ninny time for me!

Oh and I'm pulling the parent card again; a month ago I became father for the second time 😄😄

bbottema avatar Mar 28 '20 21:03 bbottema

Congrats!

For when you return -- how about the line before -- shouldn't that dot be escaped? \\.

chconnor avatar Mar 28 '20 21:03 chconnor

No, as it is already inside a character range (ie. [a-z1-9.,+*{}()]), everything is regarded as literal including operators. The only exception might be the bracket itself, which I've now escaped just to be sure (I tested it just to make sure and indeed it doesn't change the faulty outcome).

bbottema avatar Mar 28 '20 21:03 bbottema

Ok, I just found out it's not so much that the presence of ALLOW_SQUARE_BRACKETS_IN_A_TEXT causes it, but the absence of the others cause it. Using an empty criteria also causes it.

After zooming in on this, I found that leaving ALLOW_QUOTED_IDENTIFIERS out causes the problem. As soon as that is included in the criteria (in any combination), the problem goes away.

Unfortunately, the else-branch for ALLOW_SQUARE_BRACKETS_IN_A_TEXT being in the criteria covers a swath of code, so that will be more difficult to analyse right now.

if (criteria.contains(EmailAddressCriteria.ALLOW_QUOTED_IDENTIFIERS)) {
	(..) // no problems in this case
} else {
	// somewhere the following code breaks our use case
	
	// no quoted identifiers, yes|no domain literals
	local_part_da = m.group(3);
	if (local_part_da == null) {
		local_part_qs = m.group(4);
	}
	domain_part_da = m.group(5);
	if (domain_part_da == null && allowDomainLiterals) {
		domain_part_dl = m.group(6);
	}
	current_localpart = local_part_da == null ? local_part_qs : local_part_da;
	current_domainpart = domain_part_da == null ? domain_part_dl : domain_part_da;
	if (extractCfwsPersonalNames) {
		personal_string = m.group((allowDomainLiterals ? 1 : 0) + 6);
		personal_string = removeAnyBounding('(', ')', getFirstComment(personal_string, criteria));
	}
}

bbottema avatar Mar 28 '20 21:03 bbottema

Ok last one from me... on this line, should the ^ be \\ escaped? Given that it's used for negation inside ranges? Or does it get interpreted as a literal because it's followed by a -?

Anyway sounds like you've traced the issue elsewhere, just thought I'd check.

chconnor avatar Mar 28 '20 21:03 chconnor

From regular-expressions.info on Character Classes or Character Sets:

To include an unescaped caret as a literal, place it anywhere except right after the opening bracket. [x^] matches an x or a caret. This works with all flavors discussed in this tutorial.

So that's fine.

bbottema avatar Mar 28 '20 21:03 bbottema

Righto. Good luck with the bug!

chconnor avatar Mar 28 '20 21:03 chconnor