scala-uri icon indicating copy to clipboard operation
scala-uri copied to clipboard

Parse percent encoded query parameter using a different charset than UTF-8

Open conet opened this issue 2 years ago • 3 comments

I'm trying to parse an "ISO-8859-1" encoded relative URL, I can't seem to get the proper UTF-8 string out of this value:

scala> val v = RelativeUrl.parse("/path?param=r%F3n")(UriConfig(charset = "ISO-8859-1"))
val v: io.lemonlabs.uri.RelativeUrl = /path?param=r%3Fn

scala> v.query.param("param")
val res9: Option[String] = Some(r�n)

scala> v.toStringRaw
val res10: String = /path?param=r?n

The param value should be rón. So I tried this example but I can seem to get bidirectionally work properly for a custom charset, for UTF-8 it works:

scala> import io.lemonlabs.uri.RelativeUrl
import io.lemonlabs.uri.RelativeUrl

scala> import io.lemonlabs.uri.config.UriConfig
import io.lemonlabs.uri.config.UriConfig

scala> val v1 = RelativeUrl.parse("/uris-in-scala.html?chinese=网址")(UriConfig(charset = "GB2312"))
val v1: io.lemonlabs.uri.RelativeUrl = /uris-in-scala.html?chinese=%CD%F8%D6%B7

scala> val v2 = RelativeUrl.parse(v1.toString)(UriConfig(charset = "GB2312"))
val v2: io.lemonlabs.uri.RelativeUrl = /uris-in-scala.html?chinese=%3F%3F%3F

scala> val v3 = RelativeUrl.parse("/uris-in-scala.html?chinese=网址")
val v3: io.lemonlabs.uri.RelativeUrl = /uris-in-scala.html?chinese=%E7%BD%91%E5%9D%80

scala> val v4 = RelativeUrl.parse(v3.toString)
val v4: io.lemonlabs.uri.RelativeUrl = /uris-in-scala.html?chinese=%E7%BD%91%E5%9D%80

I'm trying bidirectionally because it's related to what I need. v2 and v3 are the same which makes sense but v1 and v2 are not, am I missing something? What is the proper way to parse an encoded representation that was encoded using a custom character set?

conet avatar Oct 04 '22 19:10 conet

What I'm saying is that it works properly this way:

scala> val v = RelativeUrl.parse("/path?param=rón")(UriConfig(charset = "ISO-8859-1"))
val v: io.lemonlabs.uri.RelativeUrl = /path?param=r%F3n

scala> v.toString
val res11: String = /path?param=r%F3n

scala> v.toStringRaw
val res12: String = /path?param=rón

But not the other way around:

scala> val v = RelativeUrl.parse("/path?param=r%F3n")(UriConfig(charset = "ISO-8859-1"))
val v: io.lemonlabs.uri.RelativeUrl = /path?param=r%3Fn

scala> v.toString
val res13: String = /path?param=r%3Fn

scala>  v.toStringRaw
val res14: String = /path?param=r?n

conet avatar Oct 04 '22 19:10 conet

OK, I think I found a workaround based on PercentDecoder where the UTF-8 hardcoding takes place:

val queryDecoder = PercentDecoder

new String(queryDecoder.decodeBytes("/path?param=r%F3n", "ISO-8859-1"), "ISO-8859-1")
val res21: String = /path?param=rón

I can use this as an input to RelativeUrl.parse

conet avatar Oct 04 '22 19:10 conet

Thanks for raising and figuring out where the shortcoming is 🙇‍♂️

I'm thinking we should make the PercentDecoder charset configurable

theon avatar Oct 04 '22 20:10 theon