interweave icon indicating copy to clipboard operation
interweave copied to clipboard

email matcher fails when email address is in a question

Open michaelkantor opened this issue 4 years ago • 3 comments

Email matcher handles email addresses that end in question marks as follows:

INPUT: "Can you please send an email to [email protected]?"

OUTPUT: "can you email {{{url0}}}[email protected]?{{{/url0}}}"

PROBLEM: that question mark is doing something to the regex, which matches correctly without it.

Note: I git cloned the repo and tried to take a closer look, but was unable to get any unit tests running. The rest of this message details problems related to dev setup rather than email matcher:

npm test -- packages/autolink/tests/EmailMatcher.test.ts

returns

> [email protected] pretest /Users/michaelkantor/Documents/interweave
> yarn run type

yarn run v1.22.4
$ beemo typescript --reference-workspaces --build
[1/2] CONFIG Creating config files
[2/2] DRIVER Running TypeScript v4.4.3 driver (failed)

packages/ssr/src/index.ts(138,10): error TS7017: Element implicitly has an 'any' type because type 'typeof globalThis' has no index signature.

Failed to execute pipeline. The following errors have occurred:

Command failed with exit code 1: tsc --build


    at makeError (/Users/michaelkantor/Documents/interweave/node_modules/@boost/core/node_modules/execa/lib/error.js:56:11)
    at handlePromise (/Users/michaelkantor/Documents/interweave/node_modules/@boost/core/node_modules/execa/index.js:114:26)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)

michaelkantor avatar Oct 07 '21 19:10 michaelkantor

Some further investigations:

Fails to get flagged as invalid

Input: "can you email [email protected]?"

Url Matcher returns: '[email protected]?' rather than '[email protected]'

if (response != null && response.match.match(EMAIL_DISTINCT_PATTERN)) {
      response.valid = false;
}

URL Matcher vs Email Matcher

So... why can't I replicate this in unit tests? Well... it turns out that if we have EmailMatcher BEFORE URL Matcher, everything just works. Unit tests were setup that way, and resulted in successful test after test, causing much confusion.

Potential Fixes:

  1. Update the regex so that "[email protected]?" is never returned by the URL Matcher's doMatch call. I think that any URL that ends with ? probably does not expect ? to be part of the url, its probably just a question. The rare case where we are wrong... is really on the web engineer who hosts that unusual site that requires a lone ? at the end.
  2. Always force Email Matcher to come first.
  3. Make response.match.match(EMAIL_DISTINCT_PATTERN) more forgiving of extra stuff captured by the URL matcher. For example, removing the $ from the end of the regular expression so that if response matches rather than is an email address, its treated as an invalid url match.

All that said: my own code can move EmailMatcher to be the first matcher, and that appears to resolve the issue for my usage.

michaelkantor avatar Oct 11 '21 22:10 michaelkantor

Unfortunately [email protected]? is a valid URL, as it's merely a shorthand for http://[email protected]?. The mkantor@ is the "auth" while ? is an empty query string.

Let me think about this a bit more.

milesj avatar Mar 04 '22 23:03 milesj

Any update on this issue? Running into this as well.

daria-github avatar Aug 10 '23 21:08 daria-github