play-ws
play-ws copied to clipboard
StandaloneWSClient.url(String) permits inconsistent states with query params
Summary
Behavior of query parameters in wsClient.url(String)
is not well defined and permits inconsistent states. For example, wsClient.url("http://postman-echo.com/get?pipe=|").execute()
successfully sends the request as http://postman-echo.com/get?pipe=%7C"
, but wsRequest.queryString
returns an empty map, and wsRequest.uri
throws a URISyntaxException
.
This makes methods uri
and queryString
hard to reason about and use, since they may differ from what is actually sent or throw exceptions.
Behavior Example
/**
* Demonstrates inconsistencies in [[StandaloneAhcWSClient.url()]].
*/
object UrlTest extends App {
implicit val actorSystem = ActorSystem()
implicit val materializer = ActorMaterializer()
val wsClient = StandaloneAhcWSClient()
val wsRequest = wsClient.url("http://postman-echo.com/get?pipe=|")
val wsResponse = Await.result(wsRequest.execute(), 1.minute)
println("Response body: " + wsResponse.body) // success!
println("Request Query params: " + wsRequest.queryString) // empty, but the server disagrees.
try {
println("URI: " + wsRequest.uri) // throws! and yet, AHC figured it out somehow.
} catch {
case t: URISyntaxException => t.printStackTrace(System.out)
}
finally {
wsClient.close()
materializer.shutdown()
actorSystem.terminate()
}
}
Prints:
Response body: {"args":{"pipe":"|"},"headers":{"host":"postman-echo.com","accept":"*/*","user-agent":"AHC/2.0","x-forwarded-port":"80","x-forwarded-proto":"http"},"url":"http://postman-echo.com/get?pipe=%7C"}
Request Query params: Map()
java.net.URISyntaxException: Illegal character in query at index 33: http://postman-echo.com/get?pipe=|
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parseHierarchical(URI.java:3111)
at java.net.URI$Parser.parse(URI.java:3053)
at java.net.URI.<init>(URI.java:588)
at play.api.libs.ws.ahc.StandaloneAhcWSRequest.uri$lzycompute(StandaloneAhcWSRequest.scala:59)
at play.api.libs.ws.ahc.StandaloneAhcWSRequest.uri(StandaloneAhcWSRequest.scala:52)
at play.api.libs.UrlTest$.delayedEndpoint$play$api$libs$UrlTest$1(UrlTest.scala:29)
at play.api.libs.UrlTest$delayedInit$body.apply(UrlTest.scala:16)
at scala.Function0.apply$mcV$sp(Function0.scala:34)
at scala.Function0.apply$mcV$sp$(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App.$anonfun$main$1$adapted(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:389)
at scala.App.main(App.scala:76)
at scala.App.main$(App.scala:74)
at play.api.libs.UrlTest$.main(UrlTest.scala:16)
at play.api.libs.UrlTest.main(UrlTest.scala)
Wireshark sees:
GET /get?pipe=%7C HTTP/1.1
Host: postman-echo.com
Accept: */*
User-Agent: AHC/2.0
Observations
The scaladoc is unclear. "url" implies https://tools.ietf.org/html/rfc1738, and yet "base URL"
implies maybe not?
/**
* Generates a request. Throws IllegalArgumentException if the URL is invalid.
*
* @param url The base URL to make HTTP requests to.
* @return a request
*/
@throws[IllegalArgumentException]
def url(url: String): StandaloneWSRequest
Tolerant or Strict?
I think we'd benefit from committing to one of these two approaches and documenting.
Tolerant
The behavior of org.asynchttpclient.util.UriEncoder.FIXING
could be pulled up to StandaloneAhcWSClient.url(String)
to create something sane, parsing query params, perhaps decoding the values, and putting them in the queryString
field. From there, lazy val uri: URI
can figure it out and queryString
would reflect reality.
Downside is that StandaloneWSRequest.copy(url = "")
exists and would still allow for inconsistent states. At least it'd be better.
Scaladoc would be updated to indicate that the URI parameter means https://tools.ietf.org/html/rfc1738 and tolerantly accepts query string values either encoded or in plaintext.
Strict
StandaloneAhcWSRequest
would throw if query params are present in url
, which could be renamed baseUrl: String
, and defined as "everything up to the query params".
A consistency oriented implementation could add require(uri.getQuery == null)
which would force parsing of the `java.net.URI. This is a bit heavy to do every method call of the builder.
Alternatively, we could move this check into StandaloneAhcWSClient.validate().
Scaladoc would be updated to indicate that query params MUST NOT be part of the "baseUrl".
Workarounds
In my current production Play 2.5 services, I'm defensively decorating WSClient
to intercept the WSRequest
prior to execution, and running it through UriEncoder.FIXING
just as AHC does under the hood, and moving any query params into the WSRequest.queryString
field.
It's kludgy without https://github.com/playframework/play-ws/issues/264. I ended up requiring a WSClient and just rebuilding the whole request from scratch. I'd love for this to be handled upfront by the library so I don't have to second guess every request.
Relates to https://github.com/playframework/playframework/issues/7444.
@gmethvin & @marcospereira, I could use your guidance here. Would be nice to nail this down before v2.
Hi @htmldoug,
Thanks for the detailed and well-thought issue.
Right now, I think the tolerant approach is the recommended one. There are some other expectations that we can do around the API:
- We can elaborate that
queryString
is actually used to define additional query strings, besides the ones already present at the url. Changing the name to something likeadditionalQueryString
can make it clear that it is not related to what we have in the url. -
uri
should be consistent withexecute
andvalidate
. If it is possible to run the request, than it should be possible to calluri
.
Ideally, these changes should add comprehensive tests to ensure they are consistent.
Downside is that StandaloneWSRequest.copy(url = "") exists and would still allow for inconsistent states. At least it'd be better.
We can have a method like this:
def copy(url: String)
That does all the work necessary to ensure things are consistent.
Thanks for the direction. I'd be interested in submitting a PR for the tolerant route, having written one already around my existing use of the play25 WSClient. Implementation is here: https://gist.github.com/htmldoug/38cfa714b33123c6695ec272cfcae6c1 Before proceeding, I'd like for you to witness how gnarly it is and confirm that you really want it. Feel free to mark it up with comments for any changes you'd like to see.
Overloading copy(url: String)
can still be bypassed with req.copy(url = "/pipe=|", method = "POST")
. There would also be no validation if you call StandaloneAhcWSRequest.apply()
directly. Seems like a reasonable escape hatch to leave open anyway for performance-sensitive extreme cases. Now that we have a happy path of withUrl(String)
from #268 (which would apply validation/fixing), I think that's fine. (Thank you for merging, btw!)
Note that one consequence of removing the query params from the url: String
is that the url value you get back may be different from the one you put in.
val baseUrl: String = "http://example.com/path?a=1"
ws.url(baseUrl).url == baseUrl // false. yields "http://example.com/path" perhaps surprising.
ws.queryString("a") // this would work now though. yay!
Overall, I think this is a sensible way to go. Just give me the green light and I'll put up a PR.
On the meaning of URL -- URL parsing has been a mixed bag overall. RFC 1738 is effectively obsoleted by RFC 3986 and RFC 7320 and so using "URI" over "URL" would be more accurate in general.
There are a couple of posts that talk about parsing URIs:
- http://blog.palominolabs.com/2013/10/03/creating-urls-correctly-and-safely/index.html
- https://www.talisman.org/~erlkonig/misc/lunatech%5ewhat-every-webdev-must-know-about-url-encoding/
There are external libraries that will augment the functionality of WS: galimatias validates URLs, and scala-url-builder and url-builder will build up guaranteed valid URLs.