dream icon indicating copy to clipboard operation
dream copied to clipboard

URL handling for escaped characters inconsistent between routes and requests

Open mdales opened this issue 4 weeks ago • 4 comments

Dream seems to handle URL encoding inconsistently between route registration and request handling, which caused me a bunch of confusing and debugging, and I wonder if it could be made more consistent, or am I just using things wrong?

Current Behaviour

When registering routes with Dream, the route patterns must contain unencoded (raw UTF-8) characters:

Dream.get "/photos/södermalm_pride/" handler  (* This works *)
Dream.get "/photos/s%C3%B6dermalm_pride/" handler  (* This does NOT match *)

However, when Dream logs incoming requests or when inspecting Dream.target, the paths are shown in their escaped form:

29.11.25 09:23:47.678    dream.logger  INFO REQ 1 GET /photos/s%C3%B6dermalm_pride/ 127.0.0.1:52990 fd 8 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/26.1 Safari/605.1.15

And the request here was from a link that I generated with the escaped for, so it came from:

              <a href="/photos/s%C3%B6dermalm_pride/">
                <img loading="lazy" src="/photos/s%C3%B6dermalm_pride/thumbnail.jpg" srcset="/photos/s%C3%B6dermalm_pride/[email protected] 2x,
                    /photos/s%C3%B6dermalm_pride/thumbnail.jpg 1x" title="Södermalm pride" width="271" height="350" alt="Ett foto av en pride flagga över en väg på Södermalm på en solig dag.">
              </a>

It looks to me that Dream is unencoding the URL, despite the fact I'm being consistent in both the links my code generates being escaped and that's what I register for the route.

Expected Behaviour

I'd expect that I'd use the escaped form generally everywhere, given that this is what URLs require as I understand it.

I store URLs using Uri.t and render them with Uri.to_string, but now when I add routes I have to wrap them with Uri.pct_decode before I pass them to Dream routes.

Steps to repro

let () =
  Dream.run
  @@ Dream.logger
  @@ Dream.router [
    Dream.get "/photos/södermalm_pride/" (fun _ ->
      Dream.html "Unencoded route works!");
    Dream.get "/photos/s%C3%B6dermalm_pride/" (fun _ ->
      Dream.html "Encoded route works!");
    Dream.get "/**" (fun request ->
      Dream.log "Path received: %s" (Dream.target request);
      Dream.empty `Not_Found);
  ]

When requesting /photos/södermalm_pride/:

  • The first route (unencoded) matches (no matter which order I put the escaped and unescaped routes)
  • Logs show: INFO REQ 1 GET /photos/s%C3%B6dermalm_pride/ 127.0.0.1:53018

mdales avatar Nov 29 '25 09:11 mdales

This is my current work around for this :)

https://github.com/mdales/webplats/commit/4bd3566a7444a6c41a65cd1fdaf6259386db4794

mdales avatar Nov 29 '25 09:11 mdales

And the request here was from a link that I generated with the escaped for,

Why not generate unencoded links and let browsers handle encoding? Eg,

<a href="/photos/södermalm_pride/">
  <img
    loading="lazy"
    src="/photos/södermalm_pride/thumbnail.jpg"
    srcset="/photos/södermalm_pride/[email protected] 2x, /photos/södermalm_pride/thumbnail.jpg 1x"
    title="Södermalm pride"
    width="271"
    height="350"
    alt="Ett foto av en pride flagga över en väg på Södermalm på en solig dag."
  >
</a>

Browsers have this functionality built in, so it will just automatically work with Dream.

yawaramin avatar Nov 30 '25 17:11 yawaramin

I could, but that still leaves me in what feels like an odd discrepancy whereby what what appears on the wire from the browser, what appears in the Caddy or Nginx logs, what appears in the Dream logs and what I give Dream are identical, but things don't work due to some invisible escaping process within Dream. This is why I wasted a bunch of time trying to work out where in my stack things were failing as everything was saying I was consistent, but there was an invisible stage that I had no insight into even when I looked at the Dream logs.

mdales avatar Nov 30 '25 20:11 mdales

Ok, I think this is an issue of missing documentation. We need to be explicit in the docs that Dream routes need to be un-encoded strings.

The URL wire format is expected to be encoded, that's specified. But on the client side and the server side, we want to deal with the un-encoded string, not the percent-encoded URI wire format string.

In server logs, we want to print the encoded form because just logging out any un-encoded string received from clients could be a security issue or at the very least mess with storage and processing systems for the logs. Imagine attackers crafting URLs with special Unicode characters that overflow, or look deceptive, or even just print coloured emojis when viewed. Not a great idea to allow that.

yawaramin avatar Nov 30 '25 20:11 yawaramin