[$250] Can't create hyperlink with different protocol other than HTTP or HTTPS reported by @kerupuksambel

Open kavimuru opened this issue 1 year ago • 25 comments

If you haven’t already, check out our contributing guidelines for onboarding and email [email protected] to request to join our Slack channel!

Action Performed:

Open staging.new.expensify.com
Send this HTML snippet to someone : ftp://google.com/

Expected Result:

The FTP link should be rendered as hyperlink (e.g. ftp://google.com/)

Actual Result:

The markdown doesn't rendered and it's only put as it is (ftp://google.com/)

Workaround:

unknown

Platform:

Where is this issue occurring?

Version Number: 1.2.21-4

Reproducible in staging?: y

Reproducible in production?: y

Email or phone of affected tester (no customers):

Logs: https://stackoverflow.com/c/expensify/questions/4856

Notes/Photos/Videos:

https://user-images.githubusercontent.com/43996225/198912565-9e296d72-0006-4cbe-9ab0-1e558826ae09.mp4

Expensify/Expensify Issue URL:

Issue reported by: @kerupuksambel

Slack conversation: https://expensify.slack.com/archives/C01GTK53T8Q/p1666996403875309

View all open jobs on GitHub

Oct 31 '22 01:10 kavimuru

Triggered auto assignment to @lschurr (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

Oct 31 '22 01:10 melvin-bot[bot]

I was able to reproduce. Making sure there isn't another open issue for this. https://expensify.slack.com/archives/C01GTK53T8Q/p1667236678452169?thread_ts=1666996403.875309&cid=C01GTK53T8Q

Oct 31 '22 17:10 lschurr

Current assignee @lschurr is eligible for the External assigner, not assigning anyone new.

Oct 31 '22 20:10 melvin-bot[bot]

Triggered auto assignment to Contributor-plus team member for initial proposal review - @parasharrajat (External)

Oct 31 '22 20:10 melvin-bot[bot]

Triggered auto assignment to @marcaaron (External), see https://stackoverflow.com/c/expensify/questions/7972 for more details.

Oct 31 '22 20:10 melvin-bot[bot]

Hey @marcaaron - could you just double check this and confirm this is different from other reports? Thread here.

Oct 31 '22 20:10 lschurr

Looks new to me and agree it can be it's own bug I think.

Separately, whoever fixes our link parser should also add a unit test for this.

Nov 01 '22 16:11 marcaaron

Looks like we never supported protocols other than HTTP.

What protocols are we expecting to be supported?

Nov 01 '22 16:11 parasharrajat

Posted job: https://www.upwork.com/ab/applicants/1587482663583879168/job-details https://www.upwork.com/jobs/~01e8c596a397abd681

Nov 01 '22 16:11 lschurr

Sorry for joining the discussion suddenly, but I think if we preferred some kind of hard-coded list about what to add, I think we can refer to IANA's URI scheme, along with additional unofficial, but popular URI scheme that could be added. Some suggestions maybe including schemes like tel and mailto for contacting phone number and email address respectively, ftp for accessing FTP server, zoommtg for launching Zoom client, slack for launching Slack client, msteams for launching Microsoft Teams client, git for Git repository, etc.

However, if the list approach seems to be too long or not flexible, I think we could use some regex wildcards to get the valid schemes to be allowed. The simple example I could think of is something like (([a-z0-9\-\+\.])+) (Reference from URI syntax reference in Wikipedia). However, it seems that there are several edge cases that should be covered such as tel and mailto which doesn't have double slash at its scheme.

Nov 01 '22 17:11 kerupuksambel

Good question, @lschurr What protocols are we expecting to be supported? This issue states ftp. Are there more?

Nov 01 '22 21:11 parasharrajat

@marcaaron - are you able to weigh in on these questions?

Nov 01 '22 22:11 lschurr

imo we should not have an allow list for specific scheme. In the world of deep links any text based scheme should be potentially valid.

Nov 01 '22 23:11 marcaaron

seems like @kerupuksambel has a good grasp of this subject. Will you consider putting a proposal together so we get you working on this?

Nov 01 '22 23:11 marcaaron

@marcaaron Sure thing!

Proposal

The problem here, as mentioned by @parasharrajat, is that the URI parser only matched the one with HTTP/HTTPS scheme. Therefore, my proposal is to check all the valid URI scheme.

My first proposal is going to change this

const URL_WEBSITE_REGEX = `(https?:\\/\\/)?((?:www\\.)?[-a-z0-9]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`;

in https://github.com/Expensify/expensify-common/blob/a71aad37aaba2b43b3a2d2f4be8052b025b265d3/lib/ExpensiMark.js#L5 into this (slashes in Regex was escaped due to the nature of Javascript string)

const URL_WEBSITE_REGEX = `(([a-z0-9\\+\\-\\.])+:\\/?\\/?)?((?:www\\.)?[-a-z0-9@]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`

Checking with regex101.com, the new regex have some satisfying result on some test cases that wasn't covered by the old regex.

Nov 02 '22 03:11 kerupuksambel

@parasharrajat - Could you review this proposal?

Nov 02 '22 16:11 lschurr

@kerupuksambel

This suggestion literally accepts every protocol which I think is fine. But this is not complete.

It does not handle link conversion. If you pass through the parser ftp://google.com/ with your solution.

You will get ftp://google.com/ as the text of the link and https://ftp://google.com/ as URL which is incorrect.

Nov 02 '22 16:11 parasharrajat

Bumped back to daily per our new BugZero process, internal Slack thread with deets

Nov 05 '22 15:11 mallenexpensify

@parasharrajat Really sorry for my late reply, I was sick a few days back then. Also, I completed the changes so it should be working as proposed. Therefore, if possible, I would like to revise my proposal.

Revised Proposal

Upon my further analysis to fix this bug, I found out several things, such as possibility to do XSS by javascript scheme. My take on that problem is to filter everything with javascript scheme (for example javascript:alert(1)).

Also, as a problem pointed by @parasharrajat, now the fixed proposal also should create the hyperlink with correct scheme as well.

Therefore, this is my proposal for the change of the code in lib/ExpensiMark.js in expensify-common repository.

diff --git a/lib/ExpensiMark.js b/lib/ExpensiMark.js
index 1b7c39d..6eb1ffe 100644
--- a/lib/ExpensiMark.js
+++ b/lib/ExpensiMark.js
@@ -2,7 +2,7 @@ import _ from 'underscore';
 import Str from './str';
 import TLD_REGEX from './tlds';
 
-const URL_WEBSITE_REGEX = `(https?:\\/\\/)?((?:www\\.)?[-a-z0-9]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`;
+const URL_WEBSITE_REGEX = `(([a-z0-9\\+\\-\\.])+:\\/?\\/?)?((?:www\\.)?[-a-z0-9@]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`
 const addEscapedChar = reg => `(?:${reg}|&(?:amp|quot|#x27);)`;
 const URL_PATH_REGEX = `(?:${addEscapedChar('[.,=(+$!*]')}?\\/${addEscapedChar('[-\\w$@.+!*:(),=%~]')}*${addEscapedChar('[-\\w~@:%)]')}|\\/)*`;
 const URL_PARAM_REGEX = `(?:\\?${addEscapedChar('[-\\w$@.+!*()\\/,=%{}:;\\[\\]\\|_]')}+)?`;
@@ -81,13 +81,14 @@ export default class ExpensiMark {
                 },
 
                 // We use a function here to check if there is already a https:// on the link.
+                // Also, we prevented XSS by filter whether the protocol is javascript or not
                 // If there is not, we force the link to be absolute by prepending '//' to the target.
                 replacement: (match, g1, g2, g3) => {
                     if (!g1.trim()) {
                         return match;
                     }
 
-                    return `<a href="${g3 && g3.includes('http') ? '' : 'https://'}${g2}" target="_blank" rel="noreferrer noopener">${g1}</a>`;
+                    return `<a href="${g3 !== undefined && !g3.toLowerCase().includes("javascript") ? '' : 'https://'}${g2}" target="_blank" rel="noreferrer noopener">${g1}</a>`;
                 },
             },
 
@@ -117,9 +118,12 @@ export default class ExpensiMark {
                 },
 
                 // We use a function here to check if there is already a https:// on the link.
+                // Also, we prevented XSS by filter whether the protocol is javascript or not
                 // If there is not, we force the link to be absolute by prepending '//' to the target.
+
+
                 replacement: (match, g1, g2, g3) => (
-                    `${g1}<a href="${g3 && g3.includes('http') ? '' : 'https://'}${g2}" target="_blank" rel="noreferrer noopener">${g2}</a>${g1}`
+                    `${g1}<a href="${g3 !== undefined && !g3.toLowerCase().includes("javascript") ? '' : 'https://'}${g2}" target="_blank" rel="noreferrer noopener">${g2}</a>${g1}`
                 ),
             },
             {

Nov 06 '22 07:11 kerupuksambel

Thank you for the updated proposal. Let me discuss this internally to get some details on https://github.com/Expensify/App/issues/12290#issuecomment-1299216934

Nov 06 '22 12:11 parasharrajat

Asked here https://expensify.slack.com/archives/C01GTK53T8Q/p1667828343990809.

Nov 07 '22 13:11 parasharrajat

Seems like we're going to move forward, right @parasharrajat?

Nov 08 '22 16:11 lschurr

I need to test a few things before we can. Let me do that first.

Nov 08 '22 16:11 parasharrajat

@kerupuksambel tests are failing. Can you please test your proposal and fix it so that it does not break old tests? Also, you will have to add more unit tests to test the new functionality as part of this task.

  const URL_WEBSITE_REGEX = `(([a-z0-9\\+\\-\\.])+:\\/?\\/?)?((?:www\\.)?[-a-z0-9@]+?\\.)+
 																				 ^

Why the @ was added to this regex here?

Nov 09 '22 12:11 parasharrajat

Hi @parasharrajat, first really apologize again for my slow response towards the issue. And yes, so far the test without the "@" here would pass the test. The "@" here supposed to handle email protocol, however upon better inspection towards the code, I realized the email has been handled on the code above, therefore it should be no longer needed on the code. Therefore, the revised regex should be :

const URL_WEBSITE_REGEX = `(([a-z0-9\\+\\-\\.])+:\\/?\\/?)?((?:www\\.)?[-a-z0-9]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`

And the regex passed the test (with added test units below)

As for test case, I propose this unit tests for testing the functionality of the code.

diff --git a/__tests__/ExpensiMark-HTML-test.js b/__tests__/ExpensiMark-HTML-test.js
index e1d1b2f..5895a74 100644
--- a/__tests__/ExpensiMark-HTML-test.js
+++ b/__tests__/ExpensiMark-HTML-test.js
@@ -337,6 +337,25 @@ test('Test markdown style email link with various styles', () => {
     expect(parser.replace(testString)).toBe(resultString);
 });
 
+
+test('Test URI with different schemes, while also filtering dangerous schemes (javascript)', () => {
+    const testString = 'This is [FTP link](ftp://expensify.com), '
+        + '[zoom-mtg link](zoom-mtg://expensify.com), '
+        + '[arbitrary scheme link](abc-def+12.3://expensify.com), '
+        + '[scheme with mixed case](FtP://expensify.com), '
+        + '[but still ensure no http still parsed as usual](expensify.com), '
+        + '[and javascript scheme should be filtered for better security.](javascript:console.com)';
+    
+    const resultString = 'This is <a href="ftp://expensify.com" target="_blank" rel="noreferrer noopener">FTP link</a>, '
+        + '<a href="zoom-mtg://expensify.com" target="_blank" rel="noreferrer noopener">zoom-mtg link</a>, '
+        + '<a href="abc-def+12.3://expensify.com" target="_blank" rel="noreferrer noopener">arbitrary scheme link</a>, '
+        + '<a href="FtP://expensify.com" target="_blank" rel="noreferrer noopener">scheme with mixed case</a>, '
+        + '<a href="https://expensify.com" target="_blank" rel="noreferrer noopener">but still ensure no http still parsed as usual</a>, '
+        + '<a href="https://javascript:console.com" target="_blank" rel="noreferrer noopener">and javascript scheme should be filtered for better security.</a>';
+    
+    expect(parser.replace(testString)).toBe(resultString);    
+})
+
 test('Test general email link with various styles', () => {
     const testString = 'Go to [email protected] '
         + '[email protected] '

Nov 12 '22 07:11 kerupuksambel

Hi @parasharrajat - Could you review?

Nov 14 '22 15:11 lschurr

Yup. Sure.

Nov 14 '22 17:11 parasharrajat

Ok so I tested it. It is looking good.

the javascript: scheme part . let's say the parser is parsing

javascript:console.com

Your solution will give us

https://javascript:console.com

Not sure what should be the output here. Does this look correct? For me, it is fine as it is a https url.

@kerupuksambel 's proposal + url regex from https://github.com/Expensify/App/issues/12290#issuecomment-1312401880 including tests, looks good to me.

I am no security expert but I will suggest someone from the internal team review this.

cc: @marcaaron

:ribbon: :eyes: :ribbon: C+ reviewed

Nov 14 '22 17:11 parasharrajat

Sorry, quick clarification, why are we parsing javascript:console.com? I'd think that would give us plain text as it is not a url?

Nov 14 '22 18:11 marcaaron

javascript scheme should be filtered for better security

My 2 cents I really don't think this is a requirement for this issue

Nov 14 '22 18:11 marcaaron

App App copied to clipboard

[$250] Can't create hyperlink with different protocol other than HTTP or HTTPS reported by @kerupuksambel

Action Performed:

Expected Result:

Actual Result:

Workaround:

Platform:

Proposal

Revised Proposal

App
App copied to clipboard