App
App copied to clipboard
[$1000] Can't parse deep link with email param.
If you haven’t already, check out our contributing guidelines for onboarding and email [email protected] to request to join our Slack channel!
Action Performed:
- Try to open user detail from deep link on browser (for example: https://staging.new.expensify.com/[email protected]).
- Now go to any report, paste that link and click send.
- Notice that it can't parse the whole link, and when user try to click on it, it opens the email app with the wrong email filled.
Expected Result:
App should parse the deep link detail with email param.
Actual Result:
App can't parse the deep link detail with email param. The URL prefix (https://
) is not part of the link, and we instead link incorrectly.
Workaround:
Can the user still use Expensify without this being fixed? Have you informed them of the workaround?
Platforms:
Which of our officially supported platforms is this issue occurring on?
- [ ] Android / native
- [ ] Android / Chrome
- [ ] iOS / native
- [ ] iOS / Safari
- [x] MacOS / Chrome / Safari
- [ ] MacOS / Desktop
Version Number: 1.2.92-0 Reproducible in staging?: y Reproducible in production?: y If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos: Any additional supporting documentation
https://user-images.githubusercontent.com/43996225/228922658-ef1627e6-9f3e-4f91-a52a-8cb26a5ea47f.mov

Expensify/Expensify Issue URL: Issue reported by: @hungvu193 Slack conversation: https://expensify.slack.com/archives/C049HHMV9SM/p1680191331185739
Upwork Automation - Do Not Edit
- Upwork Job URL: https://www.upwork.com/jobs/~013659e025d6ce0e4f
- Upwork Job ID: 1644482548992663552
- Last Price Increase: 2023-04-21
Triggered auto assignment to @sakluger (Bug
), see https://stackoverflow.com/c/expensify/questions/14418 for more details.
Bug0 Triage Checklist (Main S/O)
- [x] This "bug" occurs on a supported platform (ensure
Platforms
in OP are ✅) - [x] This bug is not a duplicate report (check E/App issues and #expensify-bugs)
- If it is, comment with a link to the original report, close the issue and add any novel details to the original issue instead
- [x] This bug is reproducible using the reproduction steps in the OP. S/O
- If the reproduction steps are clear and you're unable to reproduce the bug, check with the reporter and QA first, then close the issue.
- If the reproduction steps aren't clear and you determine the correct steps, please update the OP.
- [x] This issue is filled out as thoroughly and clearly as possible
- Pay special attention to the title, results, platforms where the bug occurs, and if the bug happens on staging/production.
- [x] I have reviewed and subscribed to the linked Slack conversation to ensure Slack/Github stay in sync
@sakluger Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!
@sakluger Huh... This is 4 days overdue. Who can take care of this?
@sakluger Still overdue 6 days?! Let's take care of this!
I was able to reproduce this issue, feels like a bug worth solving!
Job added to Upwork: https://www.upwork.com/jobs/~013659e025d6ce0e4f
Current assignee @sakluger is eligible for the External assigner, not assigning anyone new.
Triggered auto assignment to Contributor-plus team member for initial proposal review - @0xmiroslav (External
)
Triggered auto assignment to @johnmlee101 (External
), see https://stackoverflow.com/c/expensify/questions/7972 for more details.
I couldn't reproduce this issue. I am not sure how exactly you are copying the url. Can you provide more details, or elaborate on copying step a little bit more
Proposal
Please re-state the problem that we are trying to solve in this issue.
Deep links which contain an email address are incorrectly parsed thereby resulting in broken links in the chat history.
What is the root cause of that problem?
The ExpensiMark parser is processing the autoEmail
rule before the autoLink
rule, which results in a broken link, because the email part of the deep link is parsed first, hence the full link will never be recognised.
The currently defined order is like so: https://github.com/Expensify/expensify-common/blob/main/lib/ExpensiMark.js#L95-L130
What changes do you think we should make in order to solve the problem?
Update the autoEmail
rule like so:
{
name: 'autoEmail',
regex: new RegExp(
`(?![^<]*>|[^<>]*<\\/)(^| |(?<!http(s?):\\S*))${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`,
'gim',
),
replacement: (match, g1, g2, g3) => {
let repl = `<a href="mailto:${g3}">${g3}</a>`;
if (match.startsWith(' ')) {
repl = ' ' + repl;
}
return repl;
}
}
This will make sure that it only matches an email that starts at the beginning of the line, or if there's a space preceding the email, thereby excluding emails that are already in links. The second capture group contains the matching email, so we use that instead of match, which may include a space. The replacement string is conditionally updated to be prefixed with a space based on the match.
This has been tested and works well with the following scenarios:
[email protected] https://www.expensify.com
bruh [email protected] https://www.expensify.com
https://www.expensify.com [email protected]
https://staging.new.expensify.com/details?login= [email protected] bacon [email protected]
multiple spaces https://www.expensify.com [email protected]
[test ]]([email protected])
another scenario [[email protected]]
checking boundaries \[email protected]
https://staging.new.expensify.com/details?name=test&email= [email protected]
[email protected] last check
[email protected] [email protected] tests fine
https://www.expensify.com?name=test&[email protected]
[email protected] https://www.expensify.com?name=test&[email protected] [email protected]
Please note that the faulty parsed HTML result (with the broken links) occurs at the composer end, before the data is sent to the API, so previous messages that were already malformed will still remain the same way even after updating the ExpensiMark class implementation. A user would have to edit these messages after the changes have been applied in order to fix the broken links.
What alternative solutions did you explore? (Optional)
None.
Proposal
Please re-state the problem that we are trying to solve in this issue.
App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and the rest is in mailto protocol => we link incorrectly.
What is the root cause of that problem?
Context: We're using an internal library to parse a message from text to html in this line
The issue described in GH issue happens only when users send a plain-text url that contain email as a param. For more details, with a plain-text url string, when it's passed into our parser, it will reach this parser's rule autoEmail. As described in description, this rule tries to "automatically links emails that are not in a link". But it only works with HTML tag but not raw url string.
i.e Given input is https://staging.new.expensify.com/[email protected]
=> matching text of this rule will be //staging.new.expensify.com/[email protected]
=> output after this rule is <a href="mailto://staging.new.expensify.com/[email protected]">//staging.new.expensify.com/[email protected]</a>
=> it leads to the out of whole parser is https:<a href="mailto://staging.new.expensify.com/[email protected]">//staging.new.expensify.com/[email protected]</a>
What changes do you think we should make in order to solve the problem?
To fix this issue, we need to:
- Update the regex rule of autoEmail so that it includes optional substring regex
((?:\\S*=)?)
- Update
replacement
of autoEmail rule so that, if the matching string is an URL, we don't need to format to an email link.
Code example
Replace those LOC by
{
name: 'autoEmail',
regex: new RegExp(
`(?![^<]*>|[^<>]*<\\/)((?:\\S*=)?)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`,
'gim'
),
replacement: (match, g1, g2) => {
const urlRegex = new RegExp(`^${URL_REGEX}$`, 'gi');
if (g1 && urlRegex.test(match)) {
return match;
}
return `${g1 || ''}<a href="mailto:${g2}">${g2}</a>`;
}
}
Proposal
Please re-state the problem that we are trying to solve in this issue.
App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and we instead link incorrectly.
What is the root cause of that problem?
The root cause is here https://github.com/Expensify/expensify-common/blob/3590b7371fe8fa73288a6625daad985e7fcfb057/lib/ExpensiMark.js#L101 where we're parsing the autoEmail
before the autoLink
.
So when we're trying to parse https://staging.new.expensify.com/[email protected]
, it will parse //staging.new.expensify.com/[email protected]
as an email. Then when we try to apply the autolink
, it will not work because the <a>
tags are already inserted by the autoEmail
rule.
The ordering of autoEmail
and autolink
are like that for a reason, to avoid the auto link rule to apply to a part of the email address.
Let's consider this email [email protected]
, if the autolink
rule is before the autoEmail
rule, the google.com
will become an auto link and it will not be parsed as an email.
What changes do you think we should make in order to solve the problem?
We need to do 3 things:
-
Put the auto link rule before the autoEmail rule, this can be done simply by reodering this rule https://github.com/Expensify/expensify-common/blob/3590b7371fe8fa73288a6625daad985e7fcfb057/lib/ExpensiMark.js#L101 and this rule https://github.com/Expensify/expensify-common/blob/3590b7371fe8fa73288a6625daad985e7fcfb057/lib/ExpensiMark.js#L114
-
Address this issue.
The ordering of
autoEmail
andautolink
are like that for a reason, to avoid the auto link rule to apply to a part of the email address.
For this, we can modify the URL_WEBSITE_REGEX
here https://github.com/Expensify/expensify-common/blob/f37b3bc00d1275f232c7ddb8bbcb5b6f69310918/lib/Url.js#L3 such that it will not match if the domain is followed by an email signature (a.k.a @domain.tld
). This can be done by updating this rule https://github.com/Expensify/expensify-common/blob/f37b3bc00d1275f232c7ddb8bbcb5b6f69310918/lib/Url.js#L3
to
const URL_WEBSITE_REGEX = `${URL_PROTOCOL_REGEX}?((?:www\\.)?[a-z0-9](?:[-a-z0-9]*[a-z0-9])?\\.)+(?:${TLD_REGEX})(?!@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)(?:\\:${ALLOWED_PORTS}|\\b|(?=_))`;
(note the lookahead (?!@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)
to match the @domain.tld
signature, it's the same regex we're using to match that part of an autoEmail
)
Edited: Looks like this has been recently fixed in https://github.com/Expensify/expensify-common/pull/534
There's a hidden bug in here that makes the code abort all instances of matching links if the first match has @
. This is because if abort
is set to true once, it will persist on the next iteration of the matching links.
In current staging app, we can also easily reproduce it by sending the below
@expensify.com
https://www.expensify.com
The link https://www.expensify.com
will not be detected.
To fix this we need to move the let abort = false;
inside the while
loop here https://github.com/Expensify/expensify-common/blob/3cdaa947fe77016206c15e523017cd50678f2359/lib/ExpensiMark.js#L372 so it can be reset to false
when processing the next match.
What alternative solutions did you explore? (Optional)
- We can improve the regex in
2
if there're special edge cases that need to be addressed (special characters, ...), for example by adding
[a-zA-Z0-9.!$%&'*+=^_`{|}~-]
to match the email-compatible part between the matched domain and the @domain.tld
signature, but the main point is the same. If an accidental URL that belongs to an email address is not parsed as an auto link, it will be safe to reorder the rules.
FYI the approach to avoid replacing the autoEmail if the matched string is an URL will not work if the matched email string is not the full URL but only a part of the URL (For example an URL with the &
character)
Proposal
Please re-state the problem that we are trying to solve in this issue.
if we parse link with email param, it parased as email not link.
What is the root cause of that problem?
the REGEXP of autoEmail
allow to parse email in link.
What changes do you think we should make in order to solve the problem?
edit REGEXP of autoEmail
to be match optional ((?:\\S*[=/])?)
and use it in replacement case like this.
{
name: 'autoEmail',
regex: new RegExp(
`(?![^<]*>|[^<>]*<\\/)((?:\\S*[=/])?)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`,
'gim',
),
replacement: (match, g1, g2) => {
// this condition will return true if g1 is link/auto link which mean this email is part of link/auto link.
// we can also use URL_REGEX and MARKDOWN_URL_REGEX like this
// if(g1 match URL_REGEX or g1 match MARKDOWN_URL_REGEX) return match;
if (Str.isValidURL(Str.sanitizeURL(g1))) {
// if email is part of link or auto link skip parsing.
return match;
}
return g1 + `<a href="mailto:${g2}">${g2}</a>`;
},
},
Explanation
- RegExp will match two groups
((?:\\S*[=/])?)
and${CONST.REG_EXP.MARKDOWN_EMAIL}
. - we will use group1 to determined if the matched email (group2) is part of link or not.
2.1 if group1 is found, we need to confirm it is a link and not a normal text like chars "-" or ":"
2.2 we will use
Str.isValidURL
orURL_REGEX
to confirm the group1 is a link. 2.3 in case group1 is auto link, we will useStr.sanitizeURL
to convert it to link, then check if it is a link or notStr.isValidURL(Str.sanitizeURL(g1))
. - if we confirm group1 is a link or auto link, skip parse auto email
return match;
- if group1 is not a link, return group1 + parsed email
return g1 + `<a href="mailto:${g2}">${g2}</a>`;
all test passed in expensify-common in this comment https://github.com/Expensify/App/issues/16762#issuecomment-1598810396
What alternative solutions did you explore? (Optional)
N/A
Proposal
Please re-state the problem that we are trying to solve in this issue.
App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and we instead link incorrectly.
What is the root cause of that problem?
1 - When the user tries to send a deeplink with email as param, the parser rule that we have defined in the library rule autoEmail should automatically links the email. However our current rule can only work with HTML tags not the raw URL string.
2 - The placement of autoemail rule is also contributing to the problem, as it's been placed before autolink.
What changes do you think we should make in order to solve the problem?
Step 1 - Update the regex rule to prevent match emails if string contains http or https
Step 2 - Reorder autoEmail rule to check if the matching string is a URL, we will use URL_REGEX and don't need to format to an email link
What alternative solutions did you explore? (Optional)
none
📣 @anaskhraza! 📣
Hey, it seems we don’t have your contributor details yet! You'll only have to do this once, and this is how we'll hire you on Upwork. Please follow these steps:
- Get the email address used to login to your Expensify account. If you don't already have an Expensify account, create one here. If you have multiple accounts (e.g. one for testing), please use your main account email.
- Get the link to your Upwork profile. It's necessary because we only pay via Upwork. You can access it by logging in, and then clicking on your name. It'll look like this. If you don't already have an account, sign up for one here.
- Copy the format below and paste it in a comment on this issue. Replace the placeholder text with your actual details.
Format:
Contributor details
Your Expensify account email: <REPLACE EMAIL HERE>
Upwork Profile Link: <REPLACE LINK HERE>
Contributor details Your Expensify account email: [email protected] Upwork Profile Link: https://www.upwork.com/freelancers/~013888b99767abfe1e
✅ Contributor details stored successfully. Thank you for contributing to Expensify!
@johnmlee101, @sakluger, @0xmiroslav Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!
Hey @0xmiroslav @johnmlee101 how do the proposals look so far?
@johnmlee101 @sakluger @0xmiroslav this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!
📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸
@0xmiroslav Do you mind reviewing the proposals please? 🙇
Proposals are still in review. Testing various cases since the proposed solutions might break already working auto-email, auto-links easily, which means regression.
@johnmlee101, @sakluger, @0xmiroslav Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!
still in review
@johnmlee101 @sakluger @0xmiroslav this issue is now 3 weeks old. There is one more week left before this issue breaks WAQ and will need to go internal. What needs to happen to get a PR in review this week? Please create a thread in #expensify-open-source to discuss. Thanks!
Expect an update today
📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸