App icon indicating copy to clipboard operation
App copied to clipboard

[$250] Can't create hyperlink with different protocol other than HTTP or HTTPS reported by @kerupuksambel

Open kavimuru opened this issue 1 year ago • 25 comments

If you haven’t already, check out our contributing guidelines for onboarding and email [email protected] to request to join our Slack channel!


Action Performed:

  1. Open staging.new.expensify.com
  2. Send this HTML snippet to someone : ftp://google.com/

Expected Result:

The FTP link should be rendered as hyperlink (e.g. ftp://google.com/)

Actual Result:

The markdown doesn't rendered and it's only put as it is (ftp://google.com/)

Workaround:

unknown

Platform:

Where is this issue occurring?

  • Web

Version Number: 1.2.21-4

Reproducible in staging?: y

Reproducible in production?: y

Email or phone of affected tester (no customers):

Logs: https://stackoverflow.com/c/expensify/questions/4856

Notes/Photos/Videos:

https://user-images.githubusercontent.com/43996225/198912565-9e296d72-0006-4cbe-9ab0-1e558826ae09.mp4

ftp

Expensify/Expensify Issue URL:

Issue reported by: @kerupuksambel

Slack conversation: https://expensify.slack.com/archives/C01GTK53T8Q/p1666996403875309

View all open jobs on GitHub

kavimuru avatar Oct 31 '22 01:10 kavimuru

Triggered auto assignment to @lschurr (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

melvin-bot[bot] avatar Oct 31 '22 01:10 melvin-bot[bot]

I was able to reproduce. Making sure there isn't another open issue for this. https://expensify.slack.com/archives/C01GTK53T8Q/p1667236678452169?thread_ts=1666996403.875309&cid=C01GTK53T8Q

lschurr avatar Oct 31 '22 17:10 lschurr

Current assignee @lschurr is eligible for the External assigner, not assigning anyone new.

melvin-bot[bot] avatar Oct 31 '22 20:10 melvin-bot[bot]

Triggered auto assignment to Contributor-plus team member for initial proposal review - @parasharrajat (External)

melvin-bot[bot] avatar Oct 31 '22 20:10 melvin-bot[bot]

Triggered auto assignment to @marcaaron (External), see https://stackoverflow.com/c/expensify/questions/7972 for more details.

melvin-bot[bot] avatar Oct 31 '22 20:10 melvin-bot[bot]

Hey @marcaaron - could you just double check this and confirm this is different from other reports? Thread here.

lschurr avatar Oct 31 '22 20:10 lschurr

Looks new to me and agree it can be it's own bug I think.

Separately, whoever fixes our link parser should also add a unit test for this.

marcaaron avatar Nov 01 '22 16:11 marcaaron

Looks like we never supported protocols other than HTTP. image

What protocols are we expecting to be supported?

parasharrajat avatar Nov 01 '22 16:11 parasharrajat

Posted job: https://www.upwork.com/ab/applicants/1587482663583879168/job-details https://www.upwork.com/jobs/~01e8c596a397abd681

lschurr avatar Nov 01 '22 16:11 lschurr

Sorry for joining the discussion suddenly, but I think if we preferred some kind of hard-coded list about what to add, I think we can refer to IANA's URI scheme, along with additional unofficial, but popular URI scheme that could be added. Some suggestions maybe including schemes like tel and mailto for contacting phone number and email address respectively, ftp for accessing FTP server, zoommtg for launching Zoom client, slack for launching Slack client, msteams for launching Microsoft Teams client, git for Git repository, etc.

However, if the list approach seems to be too long or not flexible, I think we could use some regex wildcards to get the valid schemes to be allowed. The simple example I could think of is something like (([a-z0-9\-\+\.])+) (Reference from URI syntax reference in Wikipedia). However, it seems that there are several edge cases that should be covered such as tel and mailto which doesn't have double slash at its scheme.

kerupuksambel avatar Nov 01 '22 17:11 kerupuksambel

Good question, @lschurr What protocols are we expecting to be supported? This issue states ftp. Are there more?

parasharrajat avatar Nov 01 '22 21:11 parasharrajat

@marcaaron - are you able to weigh in on these questions?

lschurr avatar Nov 01 '22 22:11 lschurr

imo we should not have an allow list for specific scheme. In the world of deep links any text based scheme should be potentially valid.

marcaaron avatar Nov 01 '22 23:11 marcaaron

seems like @kerupuksambel has a good grasp of this subject. Will you consider putting a proposal together so we get you working on this?

marcaaron avatar Nov 01 '22 23:11 marcaaron

@marcaaron Sure thing!

Proposal

The problem here, as mentioned by @parasharrajat, is that the URI parser only matched the one with HTTP/HTTPS scheme. Therefore, my proposal is to check all the valid URI scheme.

My first proposal is going to change this

const URL_WEBSITE_REGEX = `(https?:\\/\\/)?((?:www\\.)?[-a-z0-9]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`;

in https://github.com/Expensify/expensify-common/blob/a71aad37aaba2b43b3a2d2f4be8052b025b265d3/lib/ExpensiMark.js#L5 into this (slashes in Regex was escaped due to the nature of Javascript string)

const URL_WEBSITE_REGEX = `(([a-z0-9\\+\\-\\.])+:\\/?\\/?)?((?:www\\.)?[-a-z0-9@]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`

Checking with regex101.com, the new regex have some satisfying result on some test cases that wasn't covered by the old regex. image

kerupuksambel avatar Nov 02 '22 03:11 kerupuksambel

@parasharrajat - Could you review this proposal?

lschurr avatar Nov 02 '22 16:11 lschurr

@kerupuksambel

This suggestion literally accepts every protocol which I think is fine. But this is not complete.

It does not handle link conversion. If you pass through the parser ftp://google.com/ with your solution.

You will get ftp://google.com/ as the text of the link and https://ftp://google.com/ as URL which is incorrect.

parasharrajat avatar Nov 02 '22 16:11 parasharrajat

Bumped back to daily per our new BugZero process, internal Slack thread with deets

mallenexpensify avatar Nov 05 '22 15:11 mallenexpensify

@parasharrajat Really sorry for my late reply, I was sick a few days back then. Also, I completed the changes so it should be working as proposed. Therefore, if possible, I would like to revise my proposal.

Revised Proposal

Upon my further analysis to fix this bug, I found out several things, such as possibility to do XSS by javascript scheme. My take on that problem is to filter everything with javascript scheme (for example javascript:alert(1)).

Also, as a problem pointed by @parasharrajat, now the fixed proposal also should create the hyperlink with correct scheme as well.

Therefore, this is my proposal for the change of the code in lib/ExpensiMark.js in expensify-common repository.

diff --git a/lib/ExpensiMark.js b/lib/ExpensiMark.js
index 1b7c39d..6eb1ffe 100644
--- a/lib/ExpensiMark.js
+++ b/lib/ExpensiMark.js
@@ -2,7 +2,7 @@ import _ from 'underscore';
 import Str from './str';
 import TLD_REGEX from './tlds';
 
-const URL_WEBSITE_REGEX = `(https?:\\/\\/)?((?:www\\.)?[-a-z0-9]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`;
+const URL_WEBSITE_REGEX = `(([a-z0-9\\+\\-\\.])+:\\/?\\/?)?((?:www\\.)?[-a-z0-9@]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`
 const addEscapedChar = reg => `(?:${reg}|&(?:amp|quot|#x27);)`;
 const URL_PATH_REGEX = `(?:${addEscapedChar('[.,=(+$!*]')}?\\/${addEscapedChar('[-\\w$@.+!*:(),=%~]')}*${addEscapedChar('[-\\w~@:%)]')}|\\/)*`;
 const URL_PARAM_REGEX = `(?:\\?${addEscapedChar('[-\\w$@.+!*()\\/,=%{}:;\\[\\]\\|_]')}+)?`;
@@ -81,13 +81,14 @@ export default class ExpensiMark {
                 },
 
                 // We use a function here to check if there is already a https:// on the link.
+                // Also, we prevented XSS by filter whether the protocol is javascript or not
                 // If there is not, we force the link to be absolute by prepending '//' to the target.
                 replacement: (match, g1, g2, g3) => {
                     if (!g1.trim()) {
                         return match;
                     }
 
-                    return `<a href="${g3 && g3.includes('http') ? '' : 'https://'}${g2}" target="_blank" rel="noreferrer noopener">${g1}</a>`;
+                    return `<a href="${g3 !== undefined && !g3.toLowerCase().includes("javascript") ? '' : 'https://'}${g2}" target="_blank" rel="noreferrer noopener">${g1}</a>`;
                 },
             },
 
@@ -117,9 +118,12 @@ export default class ExpensiMark {
                 },
 
                 // We use a function here to check if there is already a https:// on the link.
+                // Also, we prevented XSS by filter whether the protocol is javascript or not
                 // If there is not, we force the link to be absolute by prepending '//' to the target.
+
+
                 replacement: (match, g1, g2, g3) => (
-                    `${g1}<a href="${g3 && g3.includes('http') ? '' : 'https://'}${g2}" target="_blank" rel="noreferrer noopener">${g2}</a>${g1}`
+                    `${g1}<a href="${g3 !== undefined && !g3.toLowerCase().includes("javascript") ? '' : 'https://'}${g2}" target="_blank" rel="noreferrer noopener">${g2}</a>${g1}`
                 ),
             },
             {

kerupuksambel avatar Nov 06 '22 07:11 kerupuksambel

Thank you for the updated proposal. Let me discuss this internally to get some details on https://github.com/Expensify/App/issues/12290#issuecomment-1299216934

parasharrajat avatar Nov 06 '22 12:11 parasharrajat

Asked here https://expensify.slack.com/archives/C01GTK53T8Q/p1667828343990809.

parasharrajat avatar Nov 07 '22 13:11 parasharrajat

Seems like we're going to move forward, right @parasharrajat?

lschurr avatar Nov 08 '22 16:11 lschurr

I need to test a few things before we can. Let me do that first.

parasharrajat avatar Nov 08 '22 16:11 parasharrajat

@kerupuksambel tests are failing. Can you please test your proposal and fix it so that it does not break old tests? Also, you will have to add more unit tests to test the new functionality as part of this task.

  const URL_WEBSITE_REGEX = `(([a-z0-9\\+\\-\\.])+:\\/?\\/?)?((?:www\\.)?[-a-z0-9@]+?\\.)+
 																				 ^

Why the @ was added to this regex here?

parasharrajat avatar Nov 09 '22 12:11 parasharrajat

Hi @parasharrajat, first really apologize again for my slow response towards the issue. And yes, so far the test without the "@" here would pass the test. The "@" here supposed to handle email protocol, however upon better inspection towards the code, I realized the email has been handled on the code above, therefore it should be no longer needed on the code. Therefore, the revised regex should be :

const URL_WEBSITE_REGEX = `(([a-z0-9\\+\\-\\.])+:\\/?\\/?)?((?:www\\.)?[-a-z0-9]+?\\.)+(?:${TLD_REGEX})(?:\\:\\d{2,4}|\\b|(?=_))`

And the regex passed the test (with added test units below) image

As for test case, I propose this unit tests for testing the functionality of the code.

diff --git a/__tests__/ExpensiMark-HTML-test.js b/__tests__/ExpensiMark-HTML-test.js
index e1d1b2f..5895a74 100644
--- a/__tests__/ExpensiMark-HTML-test.js
+++ b/__tests__/ExpensiMark-HTML-test.js
@@ -337,6 +337,25 @@ test('Test markdown style email link with various styles', () => {
     expect(parser.replace(testString)).toBe(resultString);
 });
 
+
+test('Test URI with different schemes, while also filtering dangerous schemes (javascript)', () => {
+    const testString = 'This is [FTP link](ftp://expensify.com), '
+        + '[zoom-mtg link](zoom-mtg://expensify.com), '
+        + '[arbitrary scheme link](abc-def+12.3://expensify.com), '
+        + '[scheme with mixed case](FtP://expensify.com), '
+        + '[but still ensure no http still parsed as usual](expensify.com), '
+        + '[and javascript scheme should be filtered for better security.](javascript:console.com)';
+    
+    const resultString = 'This is <a href="ftp://expensify.com" target="_blank" rel="noreferrer noopener">FTP link</a>, '
+        + '<a href="zoom-mtg://expensify.com" target="_blank" rel="noreferrer noopener">zoom-mtg link</a>, '
+        + '<a href="abc-def+12.3://expensify.com" target="_blank" rel="noreferrer noopener">arbitrary scheme link</a>, '
+        + '<a href="FtP://expensify.com" target="_blank" rel="noreferrer noopener">scheme with mixed case</a>, '
+        + '<a href="https://expensify.com" target="_blank" rel="noreferrer noopener">but still ensure no http still parsed as usual</a>, '
+        + '<a href="https://javascript:console.com" target="_blank" rel="noreferrer noopener">and javascript scheme should be filtered for better security.</a>';
+    
+    expect(parser.replace(testString)).toBe(resultString);    
+})
+
 test('Test general email link with various styles', () => {
     const testString = 'Go to [email protected] '
         + '[email protected] '

kerupuksambel avatar Nov 12 '22 07:11 kerupuksambel

Hi @parasharrajat - Could you review?

lschurr avatar Nov 14 '22 15:11 lschurr

Yup. Sure.

parasharrajat avatar Nov 14 '22 17:11 parasharrajat

Ok so I tested it. It is looking good.

the javascript: scheme part . let's say the parser is parsing

javascript:console.com

Your solution will give us

https://javascript:console.com

Not sure what should be the output here. Does this look correct? For me, it is fine as it is a https url.


@kerupuksambel 's proposal + url regex from https://github.com/Expensify/App/issues/12290#issuecomment-1312401880 including tests, looks good to me.

I am no security expert but I will suggest someone from the internal team review this.

cc: @marcaaron

:ribbon: :eyes: :ribbon: C+ reviewed

parasharrajat avatar Nov 14 '22 17:11 parasharrajat

Sorry, quick clarification, why are we parsing javascript:console.com? I'd think that would give us plain text as it is not a url?

marcaaron avatar Nov 14 '22 18:11 marcaaron

javascript scheme should be filtered for better security

My 2 cents I really don't think this is a requirement for this issue

marcaaron avatar Nov 14 '22 18:11 marcaaron