esapi-java-legacy icon indicating copy to clipboard operation
esapi-java-legacy copied to clipboard

Add Support for mailto URLs in Encoder.getCanonicalizedURI()

Open xeno6696 opened this issue 7 years ago • 2 comments

The following unit test is incorrect in the baseline, but this one correctly shows that we don't properly canonicalize a mailto URL. For the record---the regex we currently use as a default restricts urls to the schemes "ftp" and "https*" So this is purely a future enhancement.

    public void testGetCanonicalizedUriWithMailto() throws Exception {
    	Encoder e = ESAPI.encoder();
    	
    	String expectedUri = "mailto:[email protected]?subject=Doc, do da dance";
    	//Please note that section 3.2.1 of RFC-3986 explicitly states not to encode
    	//password information as in http://palpatine:[email protected], and this will
    	//not appear in the userinfo field.  
    	String input = "mailto:[email protected]?subject=Doc,%20do%20da%20dance";
    	URI uri = new URI(input);
    	System.out.println(uri.toString());
    	assertEquals(expectedUri, e.getCanonicalizedURI(uri));
    }

xeno6696 avatar Jul 30 '17 18:07 xeno6696

After digging, mailto URIs have their own syntax separate from typical URI standard.

from RFC-6068:

  mailtoURI    = "mailto:" [ to ] [ hfields ]
  to           = addr-spec *("," addr-spec )
  hfields      = "?" hfield *( "&" hfield )
  hfield       = hfname "=" hfvalue
  hfname       = *qchar
  hfvalue      = *qchar
  addr-spec    = local-part "@" domain
  local-part   = dot-atom-text / quoted-string
  domain       = dot-atom-text / "[" *dtext-no-obs "]"
  dtext-no-obs = %d33-90 / ; Printable US-ASCII
                 %d94-126  ; characters not including
                           ; "[", "]", or "\"
  qchar        = unreserved / pct-encoded / some-delims
  some-delims  = "!" / "$" / "'" / "(" / ")" / "*"
               / "+" / "," / ";" / ":" / "@"

xeno6696 avatar Sep 01 '19 17:09 xeno6696

@xeno6696, I think we can do this if we can convert the URI to a URL? (not sure of the feasibility)

URI.toURL()

At that point we can look at the protocol of the URL to see if it's a 'mailto' protocol. If it is, then the getPath function returns the address.

I grabbed some sample mailTo addresses from the RFC-6068 document https://tools.ietf.org/html/rfc6068

and built a simple test case to pump them through the URL class and dump out all the *get method values. It seems pretty consistent.

import java.lang.reflect.Method;
import java.net.MalformedURLException;
import java.net.URL;

import org.junit.Test;

public class MailToUriTest {
	//https://tools.ietf.org/html/rfc6068
	String[] basic = new String[] {"mailto:[email protected]", "mailto:[email protected]?subject=current-issue", "mailto:[email protected]?body=send%20current-issue", "mailto:infobot@\r\n" + 
			"example.com?body=send%20current-issue%0D%0Asend%20index", "mailto:[email protected]?In-Reply-To=%3C3469A91.D10AF4C@\r\n" + 
					"   example.com%3E", "mailto:[email protected]?body=subscribe%20bamboo-l","mailto:[email protected][email protected]&body=hello", "mailto:[email protected][email protected]?body=hello", "mailto:gorby%[email protected]" };
	String[] complicated = new String[] {"mailto:\"not@me\"@example.org","mailto:\"oh\\\\no\"@example.org","mailto:\"\\\\\\\"it's\\ ugly\\\\\\\"\"@example.org"};
	@Test
	public void testURI() throws MalformedURLException, Exception {
		//String mailto = "mailto:[email protected]?subject";
		HTMLEntityCodec codec = new HTMLEntityCodec();
		for (String mailto : basic) {
			mailto = codec.decode(mailto);
			System.out.println(mailto);
			URL url = new URL(mailto);
			dumpGetMethods(url);
		}

		for (String mailto : complicated) {
			mailto = codec.decode(mailto);
			System.out.println(mailto);
			URL url = new URL(mailto);
			dumpGetMethods(url);
		}
	}

	private void dumpGetMethods(URL url) throws Exception {
		for (Method m : URL.class.getMethods()) {
			if (m.getName().startsWith("get") && m.getReturnType().equals(String.class)) {
				System.out.println("\t" + m.getName() + " " + m.invoke(url));
			}
		}
	}
}


mailto:[email protected]
	getAuthority null
	getPath [email protected]
	getQuery null
	getFile [email protected]
	getHost 
	getProtocol mailto
	getRef null
	getUserInfo null
mailto:[email protected]?subject=current-issue
	getAuthority null
	getPath [email protected]
	getQuery subject=current-issue
	getFile [email protected]?subject=current-issue
	getHost 
	getProtocol mailto
	getRef null
	getUserInfo null
mailto:[email protected]?body=send%20current-issue
	getAuthority null
	getPath [email protected]
	getQuery body=send%20current-issue
	getFile [email protected]?body=send%20current-issue
	getHost 
	getProtocol mailto
	getRef null
	getUserInfo null
mailto:infobot@
example.com?body=send%20current-issue%0D%0Asend%20index
	getAuthority null
	getPath infobot@
example.com
	getQuery body=send%20current-issue%0D%0Asend%20index
	getFile infobot@
example.com?body=send%20current-issue%0D%0Asend%20index
	getHost 
	getProtocol mailto
	getRef null
	getUserInfo null
mailto:[email protected]?In-Reply-To=%3C3469A91.D10AF4C@
   example.com%3E
	getAuthority null
	getPath [email protected]
	getQuery In-Reply-To=%3C3469A91.D10AF4C@
   example.com%3E
	getFile [email protected]?In-Reply-To=%3C3469A91.D10AF4C@
   example.com%3E
	getHost 
	getProtocol mailto
	getRef null
	getUserInfo null
mailto:[email protected]?body=subscribe%20bamboo-l
	getAuthority null
	getPath [email protected]
	getQuery body=subscribe%20bamboo-l
	getFile [email protected]?body=subscribe%20bamboo-l
	getHost 
	getProtocol mailto
	getRef null
	getUserInfo null
mailto:[email protected][email protected]&body=hello
	getAuthority null
	getPath [email protected]
	getQuery [email protected]&body=hello
	getFile [email protected][email protected]&body=hello
	getHost 
	getProtocol mailto
	getRef null
	getUserInfo null
mailto:[email protected][email protected]?body=hello
	getAuthority null
	getPath [email protected][email protected]
	getQuery body=hello
	getFile [email protected][email protected]?body=hello
	getHost 
	getProtocol mailto
	getRef null
	getUserInfo null
mailto:gorby%[email protected]
	getAuthority null
	getPath gorby%[email protected]
	getQuery null
	getFile gorby%[email protected]
	getHost 
	getProtocol mailto
	getRef null
	getUserInfo null
mailto:"not@me"@example.org
	getAuthority null
	getPath "not@me"@example.org
	getQuery null
	getFile "not@me"@example.org
	getHost 
	getProtocol mailto
	getRef null
	getUserInfo null
mailto:"oh\\no"@example.org
	getAuthority null
	getPath "oh\\no"@example.org
	getQuery null
	getFile "oh\\no"@example.org
	getHost 
	getProtocol mailto
	getRef null
	getUserInfo null
mailto:"\\\"it's\ ugly\\\""@example.org
	getAuthority null
	getPath "\\\"it's\ ugly\\\""@example.org
	getQuery null
	getFile "\\\"it's\ ugly\\\""@example.org
	getHost 
	getProtocol mailto
	getRef null
	getUserInfo null

So we may be able to use this to split it up and run it through "Appropriate" codecs?

What do you think?

jeremiahjstacey avatar Oct 02 '19 22:10 jeremiahjstacey