scala-uri
scala-uri copied to clipboard
Parse percent encoded query parameter using a different charset than UTF-8
I'm trying to parse an "ISO-8859-1" encoded relative URL, I can't seem to get the proper UTF-8 string out of this value:
scala> val v = RelativeUrl.parse("/path?param=r%F3n")(UriConfig(charset = "ISO-8859-1"))
val v: io.lemonlabs.uri.RelativeUrl = /path?param=r%3Fn
scala> v.query.param("param")
val res9: Option[String] = Some(r�n)
scala> v.toStringRaw
val res10: String = /path?param=r?n
The param value should be rón
. So I tried this example but I can seem to get bidirectionally work properly for a custom charset, for UTF-8
it works:
scala> import io.lemonlabs.uri.RelativeUrl
import io.lemonlabs.uri.RelativeUrl
scala> import io.lemonlabs.uri.config.UriConfig
import io.lemonlabs.uri.config.UriConfig
scala> val v1 = RelativeUrl.parse("/uris-in-scala.html?chinese=网址")(UriConfig(charset = "GB2312"))
val v1: io.lemonlabs.uri.RelativeUrl = /uris-in-scala.html?chinese=%CD%F8%D6%B7
scala> val v2 = RelativeUrl.parse(v1.toString)(UriConfig(charset = "GB2312"))
val v2: io.lemonlabs.uri.RelativeUrl = /uris-in-scala.html?chinese=%3F%3F%3F
scala> val v3 = RelativeUrl.parse("/uris-in-scala.html?chinese=网址")
val v3: io.lemonlabs.uri.RelativeUrl = /uris-in-scala.html?chinese=%E7%BD%91%E5%9D%80
scala> val v4 = RelativeUrl.parse(v3.toString)
val v4: io.lemonlabs.uri.RelativeUrl = /uris-in-scala.html?chinese=%E7%BD%91%E5%9D%80
I'm trying bidirectionally because it's related to what I need. v2
and v3
are the same which makes sense but v1
and v2
are not, am I missing something? What is the proper way to parse an encoded representation that was encoded using a custom character set?
What I'm saying is that it works properly this way:
scala> val v = RelativeUrl.parse("/path?param=rón")(UriConfig(charset = "ISO-8859-1"))
val v: io.lemonlabs.uri.RelativeUrl = /path?param=r%F3n
scala> v.toString
val res11: String = /path?param=r%F3n
scala> v.toStringRaw
val res12: String = /path?param=rón
But not the other way around:
scala> val v = RelativeUrl.parse("/path?param=r%F3n")(UriConfig(charset = "ISO-8859-1"))
val v: io.lemonlabs.uri.RelativeUrl = /path?param=r%3Fn
scala> v.toString
val res13: String = /path?param=r%3Fn
scala> v.toStringRaw
val res14: String = /path?param=r?n
OK, I think I found a workaround based on PercentDecoder
where the UTF-8
hardcoding takes place:
val queryDecoder = PercentDecoder
new String(queryDecoder.decodeBytes("/path?param=r%F3n", "ISO-8859-1"), "ISO-8859-1")
val res21: String = /path?param=rón
I can use this as an input to RelativeUrl.parse
Thanks for raising and figuring out where the shortcoming is 🙇♂️
I'm thinking we should make the PercentDecoder
charset configurable