libisabelle icon indicating copy to clipboard operation
libisabelle copied to clipboard

Codec.string should fail on chars >=256

Open dominique-unruh opened this issue 6 years ago • 4 comments
trafficstars

Currently, Codec.string.encode will send any string passed to it to Isabelle.

However, Isabelle can only handle character codes 0-255. Anything else leads to a Chr exception being thrown in Isabelle. (By function Char.chr.)

For example system.invoke(Operation.Hello)("α") will fail.

I think Codec.string.encode should already raise an exception if the input contains chars >=256 since otherwise one gets a hard to track down exception (The exception from Char.chr comes without any information why this exception is raised.)

dominique-unruh avatar May 08 '19 17:05 dominique-unruh

I'm a little surprised. I thought that everything above 128 gets escaped:

https://github.com/larsrh/libisabelle/blob/f9bf9bdc2b9f74e80681979169b00749c33033ba/modules/libisabelle/src/main/scala/Codec.scala#L57

larsrh avatar May 08 '19 20:05 larsrh

It's hard to be sure because I don't know how to see a stack trace for the ML side. But I suspect the following happens:

  • libisabelle produces properly escaped XML, e.g., ÈӒ
  • The ML code decodes the XML. When it encounters È, it calls Char.chr 200 to get the first character. This succeeds. Then it encounters Ӓ, it calls Char.chr 1234. This raises an exception because chars in ML cannot be >256.

As far as I understand, strings in ML cannot have code positions >=256, so there is no valid interpretation of ÈӒ (unless we encode the strings in UTF8 but I think that's not what is done).

Usually, one should not need non-ASCII strings in Isabelle anyway because it uses \<...> sequences for all non-ASCII symbols anyway. So an error seems to be the right response.

(One can convert a certain fragment of Unicode to symbols and back, https://github.com/dominique-unruh/qrhl-tool/blob/1f1f94f2ec7e86d6867f55495026458c5fdf21d8/src/main/scala/qrhl/isabelle/Isabelle.scala#L566 and https://github.com/dominique-unruh/qrhl-tool/blob/1f1f94f2ec7e86d6867f55495026458c5fdf21d8/src/main/scala/qrhl/isabelle/Isabelle.scala#L547 contain code for that.)

dominique-unruh avatar May 08 '19 20:05 dominique-unruh

Okay, I can see that now. Could you PR the change?

larsrh avatar May 09 '19 20:05 larsrh

To make a PR I would have to set up the whole libisabelle build. Instead, here is just the code fragment that I would replace Codec.string with:

  private def check256(str:String): String = {
    if (str.exists(_>=256))
      throw new RuntimeException // TODO put here whatever exception is appropriate according to the design of libisabelle
    str
  }
  implicit def string: Codec[String] = text[String](check256, Some.apply, "string").tagged("string")

dominique-unruh avatar May 13 '19 13:05 dominique-unruh