App icon indicating copy to clipboard operation
App copied to clipboard

[$1000] Can't parse deep link with email param.

Open kavimuru opened this issue 1 year ago • 52 comments

If you haven’t already, check out our contributing guidelines for onboarding and email [email protected] to request to join our Slack channel!


Action Performed:

  1. Try to open user detail from deep link on browser (for example: https://staging.new.expensify.com/[email protected]).
  2. Now go to any report, paste that link and click send.
  3. Notice that it can't parse the whole link, and when user try to click on it, it opens the email app with the wrong email filled.

Expected Result:

App should parse the deep link detail with email param.

Actual Result:

App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and we instead link incorrectly.

Workaround:

Can the user still use Expensify without this being fixed? Have you informed them of the workaround?

Platforms:

Which of our officially supported platforms is this issue occurring on?

  • [ ] Android / native
  • [ ] Android / Chrome
  • [ ] iOS / native
  • [ ] iOS / Safari
  • [x] MacOS / Chrome / Safari
  • [ ] MacOS / Desktop

Version Number: 1.2.92-0 Reproducible in staging?: y Reproducible in production?: y If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos: Any additional supporting documentation

https://user-images.githubusercontent.com/43996225/228922658-ef1627e6-9f3e-4f91-a52a-8cb26a5ea47f.mov

Untitled

Expensify/Expensify Issue URL: Issue reported by: @hungvu193 Slack conversation: https://expensify.slack.com/archives/C049HHMV9SM/p1680191331185739

View all open jobs on GitHub

Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~013659e025d6ce0e4f
  • Upwork Job ID: 1644482548992663552
  • Last Price Increase: 2023-04-21

kavimuru avatar Mar 30 '23 17:03 kavimuru

Triggered auto assignment to @sakluger (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

MelvinBot avatar Mar 30 '23 17:03 MelvinBot

Bug0 Triage Checklist (Main S/O)

  • [x] This "bug" occurs on a supported platform (ensure Platforms in OP are ✅)
  • [x] This bug is not a duplicate report (check E/App issues and #expensify-bugs)
    • If it is, comment with a link to the original report, close the issue and add any novel details to the original issue instead
  • [x] This bug is reproducible using the reproduction steps in the OP. S/O
    • If the reproduction steps are clear and you're unable to reproduce the bug, check with the reporter and QA first, then close the issue.
    • If the reproduction steps aren't clear and you determine the correct steps, please update the OP.
  • [x] This issue is filled out as thoroughly and clearly as possible
    • Pay special attention to the title, results, platforms where the bug occurs, and if the bug happens on staging/production.
  • [x] I have reviewed and subscribed to the linked Slack conversation to ensure Slack/Github stay in sync

MelvinBot avatar Mar 30 '23 17:03 MelvinBot

@sakluger Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

MelvinBot avatar Apr 03 '23 09:04 MelvinBot

@sakluger Huh... This is 4 days overdue. Who can take care of this?

MelvinBot avatar Apr 05 '23 09:04 MelvinBot

@sakluger Still overdue 6 days?! Let's take care of this!

MelvinBot avatar Apr 07 '23 09:04 MelvinBot

I was able to reproduce this issue, feels like a bug worth solving!

sakluger avatar Apr 07 '23 23:04 sakluger

Job added to Upwork: https://www.upwork.com/jobs/~013659e025d6ce0e4f

MelvinBot avatar Apr 07 '23 23:04 MelvinBot

Current assignee @sakluger is eligible for the External assigner, not assigning anyone new.

MelvinBot avatar Apr 07 '23 23:04 MelvinBot

Triggered auto assignment to Contributor-plus team member for initial proposal review - @0xmiroslav (External)

MelvinBot avatar Apr 07 '23 23:04 MelvinBot

Triggered auto assignment to @johnmlee101 (External), see https://stackoverflow.com/c/expensify/questions/7972 for more details.

MelvinBot avatar Apr 07 '23 23:04 MelvinBot

I couldn't reproduce this issue. I am not sure how exactly you are copying the url. Can you provide more details, or elaborate on copying step a little bit more

alitoshmatov avatar Apr 07 '23 23:04 alitoshmatov

Proposal

Please re-state the problem that we are trying to solve in this issue.

Deep links which contain an email address are incorrectly parsed thereby resulting in broken links in the chat history.

What is the root cause of that problem?

The ExpensiMark parser is processing the autoEmail rule before the autoLink rule, which results in a broken link, because the email part of the deep link is parsed first, hence the full link will never be recognised. The currently defined order is like so: https://github.com/Expensify/expensify-common/blob/main/lib/ExpensiMark.js#L95-L130

What changes do you think we should make in order to solve the problem?

Update the autoEmail rule like so:

{
    name: 'autoEmail',
    regex: new RegExp(
        `(?![^<]*>|[^<>]*<\\/)(^| |(?<!http(s?):\\S*))${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`,
        'gim',
    ),
    replacement: (match, g1, g2, g3) => {
        let repl = `<a href="mailto:${g3}">${g3}</a>`;
        if (match.startsWith(' ')) {
            repl = ' ' + repl;
        }
        return repl;
    }
}

This will make sure that it only matches an email that starts at the beginning of the line, or if there's a space preceding the email, thereby excluding emails that are already in links. The second capture group contains the matching email, so we use that instead of match, which may include a space. The replacement string is conditionally updated to be prefixed with a space based on the match.

This has been tested and works well with the following scenarios:

[email protected] https://www.expensify.com
bruh [email protected] https://www.expensify.com
https://www.expensify.com [email protected]
https://staging.new.expensify.com/details?login= [email protected] bacon [email protected]
multiple spaces https://www.expensify.com    [email protected]
[test ]]([email protected])
another scenario [[email protected]]
checking boundaries \[email protected]
https://staging.new.expensify.com/details?name=test&email= [email protected]
[email protected] last check
[email protected] [email protected] tests fine
https://www.expensify.com?name=test&[email protected]
[email protected] https://www.expensify.com?name=test&[email protected] [email protected]

Please note that the faulty parsed HTML result (with the broken links) occurs at the composer end, before the data is sent to the API, so previous messages that were already malformed will still remain the same way even after updating the ExpensiMark class implementation. A user would have to edit these messages after the changes have been applied in order to fix the broken links.

What alternative solutions did you explore? (Optional)

None.

akinwale avatar Apr 08 '23 00:04 akinwale

Proposal

Please re-state the problem that we are trying to solve in this issue.

App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and the rest is in mailto protocol => we link incorrectly.

What is the root cause of that problem?

Context: We're using an internal library to parse a message from text to html in this line

The issue described in GH issue happens only when users send a plain-text url that contain email as a param. For more details, with a plain-text url string, when it's passed into our parser, it will reach this parser's rule autoEmail. As described in description, this rule tries to "automatically links emails that are not in a link". But it only works with HTML tag but not raw url string.

i.e Given input is https://staging.new.expensify.com/[email protected] => matching text of this rule will be //staging.new.expensify.com/[email protected] => output after this rule is <a href="mailto://staging.new.expensify.com/[email protected]">//staging.new.expensify.com/[email protected]</a> => it leads to the out of whole parser is https:<a href="mailto://staging.new.expensify.com/[email protected]">//staging.new.expensify.com/[email protected]</a>

What changes do you think we should make in order to solve the problem?

To fix this issue, we need to:

  1. Update the regex rule of autoEmail so that it includes optional substring regex ((?:\\S*=)?)
  2. Update replacement of autoEmail rule so that, if the matching string is an URL, we don't need to format to an email link.
Code example

Replace those LOC by

         {
              name: 'autoEmail',
              regex: new RegExp(
                  `(?![^<]*>|[^<>]*<\\/)((?:\\S*=)?)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`,
                  'gim'
              ),
              replacement: (match, g1, g2) => {
                  const urlRegex = new RegExp(`^${URL_REGEX}$`, 'gi');
                  if (g1 && urlRegex.test(match)) {
                      return match;
                  }

                  return `${g1 || ''}<a href="mailto:${g2}">${g2}</a>`;
              }
          }

hoangzinh avatar Apr 08 '23 06:04 hoangzinh

Proposal

Please re-state the problem that we are trying to solve in this issue.

App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and we instead link incorrectly.

What is the root cause of that problem?

The root cause is here https://github.com/Expensify/expensify-common/blob/3590b7371fe8fa73288a6625daad985e7fcfb057/lib/ExpensiMark.js#L101 where we're parsing the autoEmail before the autoLink.

So when we're trying to parse https://staging.new.expensify.com/[email protected], it will parse //staging.new.expensify.com/[email protected] as an email. Then when we try to apply the autolink, it will not work because the <a> tags are already inserted by the autoEmail rule.

The ordering of autoEmail and autolink are like that for a reason, to avoid the auto link rule to apply to a part of the email address.

Let's consider this email [email protected], if the autolink rule is before the autoEmail rule, the google.com will become an auto link and it will not be parsed as an email.

What changes do you think we should make in order to solve the problem?

We need to do 3 things:

  1. Put the auto link rule before the autoEmail rule, this can be done simply by reodering this rule https://github.com/Expensify/expensify-common/blob/3590b7371fe8fa73288a6625daad985e7fcfb057/lib/ExpensiMark.js#L101 and this rule https://github.com/Expensify/expensify-common/blob/3590b7371fe8fa73288a6625daad985e7fcfb057/lib/ExpensiMark.js#L114

  2. Address this issue.

The ordering of autoEmail and autolink are like that for a reason, to avoid the auto link rule to apply to a part of the email address.

For this, we can modify the URL_WEBSITE_REGEX here https://github.com/Expensify/expensify-common/blob/f37b3bc00d1275f232c7ddb8bbcb5b6f69310918/lib/Url.js#L3 such that it will not match if the domain is followed by an email signature (a.k.a @domain.tld). This can be done by updating this rule https://github.com/Expensify/expensify-common/blob/f37b3bc00d1275f232c7ddb8bbcb5b6f69310918/lib/Url.js#L3 to

const URL_WEBSITE_REGEX = `${URL_PROTOCOL_REGEX}?((?:www\\.)?[a-z0-9](?:[-a-z0-9]*[a-z0-9])?\\.)+(?:${TLD_REGEX})(?!@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)(?:\\:${ALLOWED_PORTS}|\\b|(?=_))`;

(note the lookahead (?!@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+) to match the @domain.tld signature, it's the same regex we're using to match that part of an autoEmail)

Edited: Looks like this has been recently fixed in https://github.com/Expensify/expensify-common/pull/534 There's a hidden bug in here that makes the code abort all instances of matching links if the first match has @. This is because if abort is set to true once, it will persist on the next iteration of the matching links.

In current staging app, we can also easily reproduce it by sending the below

⁦@expensify.com 
https://www.expensify.com

The link https://www.expensify.com will not be detected.

To fix this we need to move the let abort = false; inside the while loop here https://github.com/Expensify/expensify-common/blob/3cdaa947fe77016206c15e523017cd50678f2359/lib/ExpensiMark.js#L372 so it can be reset to false when processing the next match.

What alternative solutions did you explore? (Optional)

  • We can improve the regex in 2 if there're special edge cases that need to be addressed (special characters, ...), for example by adding
[a-zA-Z0-9.!$%&'*+=^_`{|}~-]

to match the email-compatible part between the matched domain and the @domain.tld signature, but the main point is the same. If an accidental URL that belongs to an email address is not parsed as an auto link, it will be safe to reorder the rules.

FYI the approach to avoid replacing the autoEmail if the matched string is an URL will not work if the matched email string is not the full URL but only a part of the URL (For example an URL with the & character)

tienifr avatar Apr 08 '23 06:04 tienifr

Proposal

Please re-state the problem that we are trying to solve in this issue.

if we parse link with email param, it parased as email not link.

What is the root cause of that problem?

the REGEXP of autoEmail allow to parse email in link.

What changes do you think we should make in order to solve the problem?

edit REGEXP of autoEmail to be match optional ((?:\\S*[=/])?) and use it in replacement case like this.

{
    name: 'autoEmail',
    regex: new RegExp(
        `(?![^<]*>|[^<>]*<\\/)((?:\\S*[=/])?)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`,
        'gim',
    ),
    replacement: (match, g1, g2) => {
        // this condition will return true if g1 is link/auto link which mean this email is part of link/auto link.
        // we can also use URL_REGEX and MARKDOWN_URL_REGEX like this
        // if(g1 match URL_REGEX or g1 match MARKDOWN_URL_REGEX) return match;
        if (Str.isValidURL(Str.sanitizeURL(g1))) {
            // if email is part of link or auto link skip parsing.
            return match;
        }
        return g1 + `<a href="mailto:${g2}">${g2}</a>`;
    },
},

Explanation

  1. RegExp will match two groups ((?:\\S*[=/])?) and ${CONST.REG_EXP.MARKDOWN_EMAIL}.
  2. we will use group1 to determined if the matched email (group2) is part of link or not. 2.1 if group1 is found, we need to confirm it is a link and not a normal text like chars "-" or ":" 2.2 we will use Str.isValidURL or URL_REGEX to confirm the group1 is a link. 2.3 in case group1 is auto link, we will use Str.sanitizeURL to convert it to link, then check if it is a link or not Str.isValidURL(Str.sanitizeURL(g1)).
  3. if we confirm group1 is a link or auto link, skip parse auto email return match;
  4. if group1 is not a link, return group1 + parsed email return g1 + `<a href="mailto:${g2}">${g2}</a>`;

all test passed in expensify-common in this comment https://github.com/Expensify/App/issues/16762#issuecomment-1598810396

What alternative solutions did you explore? (Optional)

N/A

ahmedGaber93 avatar Apr 09 '23 01:04 ahmedGaber93

Proposal

Please re-state the problem that we are trying to solve in this issue.

App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and we instead link incorrectly.

What is the root cause of that problem?

1 - When the user tries to send a deeplink with email as param, the parser rule that we have defined in the library rule autoEmail should automatically links the email. However our current rule can only work with HTML tags not the raw URL string.

2 - The placement of autoemail rule is also contributing to the problem, as it's been placed before autolink.

What changes do you think we should make in order to solve the problem?

Step 1 - Update the regex rule to prevent match emails if string contains http or https

Step 2 - Reorder autoEmail rule to check if the matching string is a URL, we will use URL_REGEX and don't need to format to an email link

What alternative solutions did you explore? (Optional)

none

anaskhraza avatar Apr 09 '23 10:04 anaskhraza

📣 @anaskhraza! 📣

Hey, it seems we don’t have your contributor details yet! You'll only have to do this once, and this is how we'll hire you on Upwork. Please follow these steps:

  1. Get the email address used to login to your Expensify account. If you don't already have an Expensify account, create one here. If you have multiple accounts (e.g. one for testing), please use your main account email.
  2. Get the link to your Upwork profile. It's necessary because we only pay via Upwork. You can access it by logging in, and then clicking on your name. It'll look like this. If you don't already have an account, sign up for one here.
  3. Copy the format below and paste it in a comment on this issue. Replace the placeholder text with your actual details.

Screen Shot 2022-11-16 at 4 42 54 PM

Format:

Contributor details
Your Expensify account email: <REPLACE EMAIL HERE>
Upwork Profile Link: <REPLACE LINK HERE>

MelvinBot avatar Apr 09 '23 10:04 MelvinBot

Contributor details Your Expensify account email: [email protected] Upwork Profile Link: https://www.upwork.com/freelancers/~013888b99767abfe1e

anaskhraza avatar Apr 09 '23 10:04 anaskhraza

✅ Contributor details stored successfully. Thank you for contributing to Expensify!

MelvinBot avatar Apr 09 '23 10:04 MelvinBot

@johnmlee101, @sakluger, @0xmiroslav Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

MelvinBot avatar Apr 11 '23 10:04 MelvinBot

Hey @0xmiroslav @johnmlee101 how do the proposals look so far?

sakluger avatar Apr 12 '23 16:04 sakluger

@johnmlee101 @sakluger @0xmiroslav this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!

MelvinBot avatar Apr 13 '23 10:04 MelvinBot

📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸

MelvinBot avatar Apr 14 '23 16:04 MelvinBot

@0xmiroslav Do you mind reviewing the proposals please? 🙇

johnmlee101 avatar Apr 14 '23 16:04 johnmlee101

Proposals are still in review. Testing various cases since the proposed solutions might break already working auto-email, auto-links easily, which means regression.

0xmiroslav avatar Apr 14 '23 20:04 0xmiroslav

@johnmlee101, @sakluger, @0xmiroslav Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

MelvinBot avatar Apr 18 '23 10:04 MelvinBot

still in review

0xmiroslav avatar Apr 18 '23 14:04 0xmiroslav

@johnmlee101 @sakluger @0xmiroslav this issue is now 3 weeks old. There is one more week left before this issue breaks WAQ and will need to go internal. What needs to happen to get a PR in review this week? Please create a thread in #expensify-open-source to discuss. Thanks!

MelvinBot avatar Apr 20 '23 10:04 MelvinBot

Expect an update today

0xmiroslav avatar Apr 20 '23 10:04 0xmiroslav

📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸

MelvinBot avatar Apr 21 '23 16:04 MelvinBot