vert.x icon indicating copy to clipboard operation
vert.x copied to clipboard

The uri() method and the absoluteURI() method in HttpServerRequestImpl are inconsistent in judging the legality of uri.

Open aruis opened this issue 5 years ago • 14 comments

Vert.x 3.6.2

This is a very simple server code

    public void start() throws Exception {
        vertx.createHttpServer().requestHandler( req -> {
            System.out.println( req.uri());
            System.out.println( req.absoluteURI());

            req.response()
                    .putHeader("content-type", "text/plain")
                    .end("Hello from Vert.x!");
        }).listen(8080);
    }

When I request the address http://127.0.0.1:8080/test?a={1} in the browser, the uri() method will not report an error, but the absoluteURI() print error will return a null value. This inconsistent performance is not reasonable. The contents of the error are as follows:

/test?a={1}
17:26:41.548 [vert.x-eventloop-thread-0] ERROR io.vertx.core.http.impl.HttpServerRequestImpl - Failed to create abs uri
java.net.URISyntaxException: Illegal character in query at index 8: /test?a={1}
	at java.net.URI$Parser.fail(URI.java:2848)
	at java.net.URI$Parser.checkChars(URI.java:3021)
	at java.net.URI$Parser.parseHierarchical(URI.java:3111)
	at java.net.URI$Parser.parse(URI.java:3063)
	at java.net.URI.<init>(URI.java:588)
	at io.vertx.core.http.impl.HttpUtils.absoluteURI(HttpUtils.java:362)
	at io.vertx.core.http.impl.HttpServerRequestImpl.absoluteURI(HttpServerRequestImpl.java:379)
...
...
...
null

The first line is the result returned by the uri() method. The last line is the result returned by the absoluteURI() method.

aruis avatar Jan 10 '19 09:01 aruis

this sounds like a bug rather that being inconsistent.

uri returns the request URI, i.e what the request sent in the status line

the absolute URI is the URI that is computed from various parts using the host, etc... it seems there is a bug in this case

vietj avatar Jan 10 '19 10:01 vietj

Yes. I think it is. I will try to see the source code to see if it can be fixed.

aruis avatar Jan 10 '19 11:01 aruis

please find the nature of the bug and rename this issue after it

vietj avatar Jan 10 '19 11:01 vietj

ok

aruis avatar Jan 10 '19 12:01 aruis

Looking at the HttpServer code this seems to be caused by the fact that uri() returns the String value of the underlying netty request which is created by the HttpObjectDecoder. This does no validtion that this is String is in anyway a valid java.net.URI, it simply split the initial line by whitespace. .absoluteURI() internally calls HttpUtils.absoluteURI(...) which validates the result of uri() by calling new URI(req.uri()). The URI is not needed apart from validation in this function.

To fix this Bug we either need to always validate what is returned by .uri() or stop validation in .absoluteURI() @vietj What would you suggest ?

kerko avatar Jan 10 '19 13:01 kerko

we don't do validation on purpose, error happens because we use java.net.URI object to parse, so we should parse differently I guess

vietj avatar Jan 10 '19 14:01 vietj

The problem with absoluteURI() is mainly because there are unsafe characters in the get request. This was discussed in https://github.com/eclipse-vertx/vert.x/issues/1997. But I think that since uri() returns correctly, then absoluteURI() must also be returned in the correct way. After careful study, I think it is the impact of this code: https://github.com/eclipse-vertx/vert.x/blob/f68ae15171053d343590e54884ec377f095ba6b3/src/main/java/io/vertx/core/http/impl/HttpUtils.java#L360-L366

I don't understand why new URI(req.uri()), Then get the value of uri.getScheme(), because uri.getScheme() always returns null. Going back to the issues we are currently discussing, because the new URI() encounters an unsafe character, then the JDK throws an exception. I have provided optimized code and unit tests here: https://github.com/aruis/vert.x/commit/c9c8813e5e6a0ab173e1155f1e7aa700ee7a4ee7 if you feel it is appropriate, I will pull request. Please forgive my lame English. Looking forward to your reply @vietj

aruis avatar Jan 11 '19 01:01 aruis

  • But I think that since uri() returns correctly, then absoluteURI() must also be returned in the correct way. : I don't agree with you, both are different.
  • uri.getScheme() does not always return null, a request URI can be an absolute URL, so you need to take it in account and test it. See https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html 5.1.2 : Request-URI = "*" | absoluteURI | abs_path | authority

vietj avatar Jan 11 '19 07:01 vietj

I understand that uri can contain scheme, but I have not tested in vert.x that uri.getScheme() is not null.

aruis avatar Jan 11 '19 07:01 aruis

we don't do validation on purpose

If validation isn't needed here, then my intuition would be that the existing code

https://github.com/eclipse-vertx/vert.x/blob/cbf38bfc2cfcceb4ccbae89ba6619f0a5158dff8/src/main/java/io/vertx/core/http/impl/HttpUtils.java#L364-L379

could be adapted by replacing

    URI uri = new URI(req.uri());
    String scheme = uri.getScheme();
    if (scheme != null && (scheme.equals("http") || scheme.equals("https"))) {
      absoluteURI = uri.toString();

with

    String uri = req.uri();
    if (uri.startsWith("http://") || uri.startsWith("https://")) {
      absoluteURI = uri;

to skip parsing the URI.

In general, if not wanting to let HttpServerRequest.absoluteURI throw an URISyntaxException, I think it would be better to return the unvalidated URI string there, instead of null. Then the invoking code can do validation if needed.

The current logging on ERROR level here

https://github.com/eclipse-vertx/vert.x/blob/29cc5bd748cd5e352c40e7f86352cac129d40c0a/src/main/java/io/vertx/core/http/impl/HttpServerRequestImpl.java#L372-L374

is something that we found to cause unnecessary clutter in production log output, especially when req.absoluteURI() gets called multiple times during request processing (e.g. for log/tracing output).

calohmn avatar May 11 '21 14:05 calohmn

@vietj Any thoughts on the above? Doing the "startsWith("http(s)://" check instead of parsing the URI to check for an absolute URI.

calohmn avatar May 20 '21 05:05 calohmn

I need to look at this again @calohmn

vietj avatar May 20 '21 11:05 vietj

I've started to read this again and gather information, here is what I found:.

  • { and } are not valid character in query string of an URI
  • Safari will anyway send those char in clear text in the request URI (I checked with Wireshark), other browsers might not
  • the URI rfc3986 does not say anything specific about those chars (they are simply not allowed)
  • the URL rfc1738 mention { and } as unsafe
    • All unsafe characters must always be encoded within a URL.
    • These characters are "{", "}", "|", "", "^", "~","[", "]", and "`".
  • there are interesting concerns about this
    • https://github.com/SergioBenitez/Rocket/issues/924
    • https://stackoverflow.com/questions/53766725/curl-does-not-work-with-urls-with-curly-braces-in-parameters
  • the HttpServerRequest#uri() method returns the HTTP request uri as provided by the HTTP request with no interpretation
  • the HttpServerRequest#absoluteURI() method returns a best effort to compute an URL that was called by the client, e.g it might use the HTTP server bind address when the request header host is not present)

So it looks like the current implementation is correct as it will not return an invalid URL, however it might not be the best choice for the application.

I think we could add some leniency here and add a boolean flag that will validate or not the returned absolute URI and the default implementation would be strict (no breaking change).

vietj avatar Jun 02 '21 12:06 vietj

scoped to 4.1.1 - back port might be possible

vietj avatar Jun 02 '21 12:06 vietj

Fixed by https://github.com/eclipse-vertx/vert.x/pull/5158

tsegismont avatar Mar 20 '24 13:03 tsegismont