App
App copied to clipboard
[$250] Can't create hyperlink with different protocol other than HTTP or HTTPS reported by @kerupuksambel
If you haven’t already, check out our contributing guidelines for onboarding and email [email protected] to request to join our Slack channel!
Action Performed:
- Open staging.new.expensify.com
- Send this HTML snippet to someone : ftp://google.com/
Expected Result:
The FTP link should be rendered as hyperlink (e.g. ftp://google.com/)
Actual Result:
The markdown doesn't rendered and it's only put as it is (ftp://google.com/)
Workaround:
unknown
Platform:
Where is this issue occurring?
- Web
Version Number: 1.2.21-4
Reproducible in staging?: y
Reproducible in production?: y
Email or phone of affected tester (no customers):
Logs: https://stackoverflow.com/c/expensify/questions/4856
Notes/Photos/Videos:
https://user-images.githubusercontent.com/43996225/198912565-9e296d72-0006-4cbe-9ab0-1e558826ae09.mp4
Expensify/Expensify Issue URL:
Issue reported by: @kerupuksambel
Slack conversation: https://expensify.slack.com/archives/C01GTK53T8Q/p1666996403875309
Triggered auto assignment to @lschurr (Bug
), see https://stackoverflow.com/c/expensify/questions/14418 for more details.
I was able to reproduce. Making sure there isn't another open issue for this. https://expensify.slack.com/archives/C01GTK53T8Q/p1667236678452169?thread_ts=1666996403.875309&cid=C01GTK53T8Q
Current assignee @lschurr is eligible for the External assigner, not assigning anyone new.
Triggered auto assignment to Contributor-plus team member for initial proposal review - @parasharrajat (External
)
Triggered auto assignment to @marcaaron (External
), see https://stackoverflow.com/c/expensify/questions/7972 for more details.
Hey @marcaaron - could you just double check this and confirm this is different from other reports? Thread here.
Looks new to me and agree it can be it's own bug I think.
Separately, whoever fixes our link parser should also add a unit test for this.
Looks like we never supported protocols other than HTTP.
What protocols are we expecting to be supported?
Posted job: https://www.upwork.com/ab/applicants/1587482663583879168/job-details https://www.upwork.com/jobs/~01e8c596a397abd681
Sorry for joining the discussion suddenly, but I think if we preferred some kind of hard-coded list about what to add, I think we can refer to IANA's URI scheme, along with additional unofficial, but popular URI scheme that could be added. Some suggestions maybe including schemes like tel
and mailto
for contacting phone number and email address respectively, ftp
for accessing FTP server, zoommtg
for launching Zoom client, slack
for launching Slack client, msteams
for launching Microsoft Teams client, git
for Git repository, etc.
However, if the list approach seems to be too long or not flexible, I think we could use some regex wildcards to get the valid schemes to be allowed. The simple example I could think of is something like (([a-z0-9\-\+\.])+)
(Reference from URI syntax reference in Wikipedia). However, it seems that there are several edge cases that should be covered such as tel
and mailto
which doesn't have double slash at its scheme.
Good question, @lschurr What protocols are we expecting to be supported? This issue states ftp
. Are there more?
@marcaaron - are you able to weigh in on these questions?
imo we should not have an allow list for specific scheme. In the world of deep links any text based scheme should be potentially valid.
seems like @kerupuksambel has a good grasp of this subject. Will you consider putting a proposal together so we get you working on this?
@marcaaron Sure thing!
Proposal
The problem here, as mentioned by @parasharrajat, is that the URI parser only matched the one with HTTP/HTTPS scheme. Therefore, my proposal is to check all the valid URI scheme.
My first proposal is going to change this
const URL_WEBSITE_REGEX = `(https?:\\/\\/)?((?:www\\.)?[-a-z0-9]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`;
in https://github.com/Expensify/expensify-common/blob/a71aad37aaba2b43b3a2d2f4be8052b025b265d3/lib/ExpensiMark.js#L5 into this (slashes in Regex was escaped due to the nature of Javascript string)
const URL_WEBSITE_REGEX = `(([a-z0-9\\+\\-\\.])+:\\/?\\/?)?((?:www\\.)?[-a-z0-9@]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`
Checking with regex101.com, the new regex have some satisfying result on some test cases that wasn't covered by the old regex.
@parasharrajat - Could you review this proposal?
@kerupuksambel
This suggestion literally accepts every protocol which I think is fine. But this is not complete.
It does not handle link conversion. If you pass through the parser ftp://google.com/
with your solution.
You will get ftp://google.com/
as the text of the link and https://ftp://google.com/
as URL which is incorrect.
Bumped back to daily per our new BugZero process, internal Slack thread with deets
@parasharrajat Really sorry for my late reply, I was sick a few days back then. Also, I completed the changes so it should be working as proposed. Therefore, if possible, I would like to revise my proposal.
Revised Proposal
Upon my further analysis to fix this bug, I found out several things, such as possibility to do XSS by javascript
scheme. My take on that problem is to filter everything with javascript
scheme (for example javascript:alert(1)
).
Also, as a problem pointed by @parasharrajat, now the fixed proposal also should create the hyperlink with correct scheme as well.
Therefore, this is my proposal for the change of the code in lib/ExpensiMark.js
in expensify-common repository.
diff --git a/lib/ExpensiMark.js b/lib/ExpensiMark.js
index 1b7c39d..6eb1ffe 100644
--- a/lib/ExpensiMark.js
+++ b/lib/ExpensiMark.js
@@ -2,7 +2,7 @@ import _ from 'underscore';
import Str from './str';
import TLD_REGEX from './tlds';
-const URL_WEBSITE_REGEX = `(https?:\\/\\/)?((?:www\\.)?[-a-z0-9]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`;
+const URL_WEBSITE_REGEX = `(([a-z0-9\\+\\-\\.])+:\\/?\\/?)?((?:www\\.)?[-a-z0-9@]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`
const addEscapedChar = reg => `(?:${reg}|&(?:amp|quot|#x27);)`;
const URL_PATH_REGEX = `(?:${addEscapedChar('[.,=(+$!*]')}?\\/${addEscapedChar('[-\\w$@.+!*:(),=%~]')}*${addEscapedChar('[-\\w~@:%)]')}|\\/)*`;
const URL_PARAM_REGEX = `(?:\\?${addEscapedChar('[-\\w$@.+!*()\\/,=%{}:;\\[\\]\\|_]')}+)?`;
@@ -81,13 +81,14 @@ export default class ExpensiMark {
},
// We use a function here to check if there is already a https:// on the link.
+ // Also, we prevented XSS by filter whether the protocol is javascript or not
// If there is not, we force the link to be absolute by prepending '//' to the target.
replacement: (match, g1, g2, g3) => {
if (!g1.trim()) {
return match;
}
- return `<a href="${g3 && g3.includes('http') ? '' : 'https://'}${g2}" target="_blank" rel="noreferrer noopener">${g1}</a>`;
+ return `<a href="${g3 !== undefined && !g3.toLowerCase().includes("javascript") ? '' : 'https://'}${g2}" target="_blank" rel="noreferrer noopener">${g1}</a>`;
},
},
@@ -117,9 +118,12 @@ export default class ExpensiMark {
},
// We use a function here to check if there is already a https:// on the link.
+ // Also, we prevented XSS by filter whether the protocol is javascript or not
// If there is not, we force the link to be absolute by prepending '//' to the target.
+
+
replacement: (match, g1, g2, g3) => (
- `${g1}<a href="${g3 && g3.includes('http') ? '' : 'https://'}${g2}" target="_blank" rel="noreferrer noopener">${g2}</a>${g1}`
+ `${g1}<a href="${g3 !== undefined && !g3.toLowerCase().includes("javascript") ? '' : 'https://'}${g2}" target="_blank" rel="noreferrer noopener">${g2}</a>${g1}`
),
},
{
Thank you for the updated proposal. Let me discuss this internally to get some details on https://github.com/Expensify/App/issues/12290#issuecomment-1299216934
Asked here https://expensify.slack.com/archives/C01GTK53T8Q/p1667828343990809.
Seems like we're going to move forward, right @parasharrajat?
I need to test a few things before we can. Let me do that first.
@kerupuksambel tests are failing. Can you please test your proposal and fix it so that it does not break old tests? Also, you will have to add more unit tests to test the new functionality as part of this task.
const URL_WEBSITE_REGEX = `(([a-z0-9\\+\\-\\.])+:\\/?\\/?)?((?:www\\.)?[-a-z0-9@]+?\\.)+
^
Why the @ was added to this regex here?
Hi @parasharrajat, first really apologize again for my slow response towards the issue. And yes, so far the test without the "@" here would pass the test. The "@" here supposed to handle email protocol, however upon better inspection towards the code, I realized the email has been handled on the code above, therefore it should be no longer needed on the code. Therefore, the revised regex should be :
const URL_WEBSITE_REGEX = `(([a-z0-9\\+\\-\\.])+:\\/?\\/?)?((?:www\\.)?[-a-z0-9]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`
And the regex passed the test (with added test units below)
As for test case, I propose this unit tests for testing the functionality of the code.
diff --git a/__tests__/ExpensiMark-HTML-test.js b/__tests__/ExpensiMark-HTML-test.js
index e1d1b2f..5895a74 100644
--- a/__tests__/ExpensiMark-HTML-test.js
+++ b/__tests__/ExpensiMark-HTML-test.js
@@ -337,6 +337,25 @@ test('Test markdown style email link with various styles', () => {
expect(parser.replace(testString)).toBe(resultString);
});
+
+test('Test URI with different schemes, while also filtering dangerous schemes (javascript)', () => {
+ const testString = 'This is [FTP link](ftp://expensify.com), '
+ + '[zoom-mtg link](zoom-mtg://expensify.com), '
+ + '[arbitrary scheme link](abc-def+12.3://expensify.com), '
+ + '[scheme with mixed case](FtP://expensify.com), '
+ + '[but still ensure no http still parsed as usual](expensify.com), '
+ + '[and javascript scheme should be filtered for better security.](javascript:console.com)';
+
+ const resultString = 'This is <a href="ftp://expensify.com" target="_blank" rel="noreferrer noopener">FTP link</a>, '
+ + '<a href="zoom-mtg://expensify.com" target="_blank" rel="noreferrer noopener">zoom-mtg link</a>, '
+ + '<a href="abc-def+12.3://expensify.com" target="_blank" rel="noreferrer noopener">arbitrary scheme link</a>, '
+ + '<a href="FtP://expensify.com" target="_blank" rel="noreferrer noopener">scheme with mixed case</a>, '
+ + '<a href="https://expensify.com" target="_blank" rel="noreferrer noopener">but still ensure no http still parsed as usual</a>, '
+ + '<a href="https://javascript:console.com" target="_blank" rel="noreferrer noopener">and javascript scheme should be filtered for better security.</a>';
+
+ expect(parser.replace(testString)).toBe(resultString);
+})
+
test('Test general email link with various styles', () => {
const testString = 'Go to [email protected] '
+ '[email protected] '
Hi @parasharrajat - Could you review?
Yup. Sure.
Ok so I tested it. It is looking good.
the javascript: scheme part . let's say the parser is parsing
javascript:console.com
Your solution will give us
https://javascript:console.com
Not sure what should be the output here. Does this look correct? For me, it is fine as it is a https url.
@kerupuksambel 's proposal + url regex from https://github.com/Expensify/App/issues/12290#issuecomment-1312401880 including tests, looks good to me.
I am no security expert but I will suggest someone from the internal team review this.
cc: @marcaaron
:ribbon: :eyes: :ribbon: C+ reviewed
Sorry, quick clarification, why are we parsing javascript:console.com
? I'd think that would give us plain text as it is not a url?
javascript scheme should be filtered for better security
My 2 cents I really don't think this is a requirement for this issue