interweave
interweave copied to clipboard
email matcher fails when email address is in a question
Email matcher handles email addresses that end in question marks as follows:
INPUT: "Can you please send an email to [email protected]?"
OUTPUT: "can you email {{{url0}}}[email protected]?{{{/url0}}}"
PROBLEM: that question mark is doing something to the regex, which matches correctly without it.
Note: I git cloned the repo and tried to take a closer look, but was unable to get any unit tests running. The rest of this message details problems related to dev setup rather than email matcher:
npm test -- packages/autolink/tests/EmailMatcher.test.ts
returns
> [email protected] pretest /Users/michaelkantor/Documents/interweave
> yarn run type
yarn run v1.22.4
$ beemo typescript --reference-workspaces --build
[1/2] CONFIG Creating config files
[2/2] DRIVER Running TypeScript v4.4.3 driver (failed)
packages/ssr/src/index.ts(138,10): error TS7017: Element implicitly has an 'any' type because type 'typeof globalThis' has no index signature.
Failed to execute pipeline. The following errors have occurred:
Command failed with exit code 1: tsc --build
at makeError (/Users/michaelkantor/Documents/interweave/node_modules/@boost/core/node_modules/execa/lib/error.js:56:11)
at handlePromise (/Users/michaelkantor/Documents/interweave/node_modules/@boost/core/node_modules/execa/index.js:114:26)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
Some further investigations:
Fails to get flagged as invalid
Input: "can you email [email protected]?"
Url Matcher returns: '[email protected]?' rather than '[email protected]'
if (response != null && response.match.match(EMAIL_DISTINCT_PATTERN)) {
response.valid = false;
}
- INPUT: [email protected]? RESULT: valid =
true - INPUT: [email protected]. RESULT: valid =
false
URL Matcher vs Email Matcher
So... why can't I replicate this in unit tests? Well... it turns out that if we have EmailMatcher BEFORE URL Matcher, everything just works. Unit tests were setup that way, and resulted in successful test after test, causing much confusion.
Potential Fixes:
- Update the regex so that "[email protected]?" is never returned by the URL Matcher's doMatch call. I think that any URL that ends with
?probably does not expect?to be part of the url, its probably just a question. The rare case where we are wrong... is really on the web engineer who hosts that unusual site that requires a lone?at the end. - Always force Email Matcher to come first.
- Make
response.match.match(EMAIL_DISTINCT_PATTERN)more forgiving of extra stuff captured by the URL matcher. For example, removing the$from the end of the regular expression so that ifresponsematches rather than is an email address, its treated as an invalid url match.
All that said: my own code can move EmailMatcher to be the first matcher, and that appears to resolve the issue for my usage.
Unfortunately [email protected]? is a valid URL, as it's merely a shorthand for http://[email protected]?. The mkantor@ is the "auth" while ? is an empty query string.
Let me think about this a bit more.
Any update on this issue? Running into this as well.