Use getPath instead of getRawPath to prevent doulbe encoding of the URI
Use getPath instead of getRawPath to prevent doulbe encoding of the URI to fix issue #3482.
See https://github.com/spring-cloud/spring-cloud-gateway/pull/3437
I'm not sure this is going to work. Consider the url "http://example.com/foo–bar foo" (that's an en dash, not a dash):
String url = "http://example.com/foo%E2%80%93bar%20foo";
URI uri = URI.create(url);
String path = uri.getPath();
URI uri2 = UriComponentsBuilder.fromUri(uri).replacePath(path).build().toUri();
uri2 is "http://example.com/foo–bar%20foo", so the space got re-encoded properly, but the en dash did not.
I'm not sure this is going to work. Consider the url "http://example.com/foo–bar foo" (that's an en dash, not a dash):
You are right, it only solves the issue for ascii characters because it seems UriComponentsBuilder does not encode anything else. I'm not sure if there is an easy solution. I have seen your PR (#3437), which would encode space and the en dash correctly. But for rewrite path it would do the regex on the encoded url which i guess no one expects when writing a rewrite rule.
@rworsnop @spencergibb Do you have an idea how we could support both, rewrite (applying regex) on the correct path and still maintain support for non-ascii?
@rworsnop it seems that neither jdks URI nor the UriComponentsBuilder from Spring escapes unicodes outside of ascii.
jshell> new URI("http", "localhost", "/te-st test", null)
$32 ==> http://localhost/te-st%20test
UriComponentsBuilder.fromUri(request.uri()).replacePath("/te-st test").build().toUri();
http://localhost:8080/te-st%20test
Yet, the code from PR is still working for unicode characters, because it seems the http-client or something after the filter is escaping the en-dash correctly. But I found another issue: While the URI from componentsbuilder with te-st is sending te%E2%80%93st and te set is sending te%20% there is an issue if you have both! If we leave the filter with the URI te-st test it gets send to the downstream server as te%E2%80%93st%2520test, so it is double-encoded again. Something in the spring chain is "detecting" the unicode character and decides to encode everything again. Maybe it is also somewhere in the JDK.
I've found a solution, it seems .encode() from the UriComponentsBuilder does encode everything correctly. It does not double encode the space. It still encodes the unicodes (see tests).
@spencergibb tests have also been added
@spencergibb Would it be possible to get either this fix or #3437 included in the next release? We are also encountering this issue with spaces (and other special characters) encoding twice.
related https://github.com/spring-cloud/spring-cloud-gateway/issues/3657#issue-2767306840
I'm going to actively look at this class of problem in the next few weeks.
This is also addressed in https://github.com/spring-cloud/spring-cloud-gateway/pull/3658 which appears to address another double encoding issue, so I am closing this PR.
@ryanjbaxter there's an important difference that is only important for the rewritePath filter: The regex is applied on the path and I think everyone would assume, that the path is not escaped when writing a regex. Thats why I used getPath, so the method uses internally the unescaped string to apply the regex.
Should i try to add the code from here into #3658? Or should leave out rewritePath from #3658 and use this PR only for this filter?
Alternatively, I think my approach here would also work for all other filters, so we could also add the query-escaping code form #3658 here, but I haven't tested.
Now, #3658 has been merged without adding any fix for rewritePath (or any other filter except setPath and stripPrefix).
Can we reopen this?
Yes my bad, sorry about that!
@jensmatw Could you make the PR against the 4.1.x branch and resolve the merge conflicts?
@jensmatw can you sign your commits so the DCO check passes?