libisabelle
libisabelle copied to clipboard
Codec.string should fail on chars >=256
Currently, Codec.string.encode will send any string passed to it to Isabelle.
However, Isabelle can only handle character codes 0-255. Anything else leads to a Chr exception being thrown in Isabelle. (By function Char.chr.)
For example system.invoke(Operation.Hello)("α") will fail.
I think Codec.string.encode should already raise an exception if the input contains chars >=256 since otherwise one gets a hard to track down exception (The exception from Char.chr comes without any information why this exception is raised.)
I'm a little surprised. I thought that everything above 128 gets escaped:
https://github.com/larsrh/libisabelle/blob/f9bf9bdc2b9f74e80681979169b00749c33033ba/modules/libisabelle/src/main/scala/Codec.scala#L57
It's hard to be sure because I don't know how to see a stack trace for the ML side. But I suspect the following happens:
- libisabelle produces properly escaped XML, e.g.,
ÈӒ - The ML code decodes the XML. When it encounters
È, it callsChar.chr 200to get the first character. This succeeds. Then it encountersӒ, it callsChar.chr 1234. This raises an exception because chars in ML cannot be >256.
As far as I understand, strings in ML cannot have code positions >=256, so there is no valid interpretation of ÈӒ (unless we encode the strings in UTF8 but I think that's not what is done).
Usually, one should not need non-ASCII strings in Isabelle anyway because it uses \<...> sequences for all non-ASCII symbols anyway. So an error seems to be the right response.
(One can convert a certain fragment of Unicode to symbols and back, https://github.com/dominique-unruh/qrhl-tool/blob/1f1f94f2ec7e86d6867f55495026458c5fdf21d8/src/main/scala/qrhl/isabelle/Isabelle.scala#L566 and https://github.com/dominique-unruh/qrhl-tool/blob/1f1f94f2ec7e86d6867f55495026458c5fdf21d8/src/main/scala/qrhl/isabelle/Isabelle.scala#L547 contain code for that.)
Okay, I can see that now. Could you PR the change?
To make a PR I would have to set up the whole libisabelle build. Instead, here is just the code fragment that I would replace Codec.string with:
private def check256(str:String): String = {
if (str.exists(_>=256))
throw new RuntimeException // TODO put here whatever exception is appropriate according to the design of libisabelle
str
}
implicit def string: Codec[String] = text[String](check256, Some.apply, "string").tagged("string")