show doesn't have a handler for Uchar.t
show doesn't have a built-in handler for Uchar.t, which makes it messy to try to derive a show function for something containing a Uchar.t.
(BTW, what's a reasonable temporary workaround?)
In general, if there's a type without an existing deriver you can create a type alias and define your own deriver. In the case of show you'd need a pretty printer for the type. For example, something very close to the following should work if you fill in an appropriate definition for pp_uc.
type uc = Uchar.t
let pp_uc pp uc = ...
type t = { u : uc } [@@deriving show]
Please submit a PR for Uchar.t, this would be very welcome.
I can submit a PR for Uchar.t, but we need to decide on a printed representation that we're comfortable with. (That is to say, can someone suggest a color for the bikeshed?)
Format.printf "U+%04X" codepoint
Or maybe Uchar.of_char '%c' for printable and Uchar.of_int 0x%04X for non-printable...
That latter suggestion isn't far from what @Drup suggested, though figuring out what's printable given only the tools in the stdlib isn't easy, so it might need to default to the latter.
U+HHHH seems problematic because it can't be a valid OCaml read syntax ever, at least not without nasty special casing.
You can conservatively stick to ASCII. 0x20..0x7E?
You can conservatively stick to ASCII. 0x20..0x7E?
Sorry, can you expand on that?
@hcarty btw, in your example:
let pp_uc pp uc = ...
I am guessing the uc is the actual Uchar.t, but what's the pp argument?
Sorry, can you expand on that?
I mean, convert it to int and check that it's between 0x20 and 0x7E. That should cover all printable ASCII range...
My current prototype is:
let pp_uchar f uc =
let ui = Uchar.to_int uc in
if ui < 128
then Format.fprintf f "(Uchar.of_char '%s')" (Char.escaped (Uchar.to_char uc))
else Format.fprintf f "(Uchar.of_int 0x%04x)" ui
You need to skip 0x00 to 0x1F inclusive, too.
Why? Tab is better expressed as '\t' etc.
(I can see suggesting that things other than \t, \n, \r etc. should be expressed in the "normal" way though.)
Oh sorry, I missed Char.escaped. You are right.
One suggestion for an OCaml Uchar.t direct syntax that’s evolved on the discord channel:
let pi : Uchar.t = \u'π'
(for direct entry of Unicode chars in source)
and
let alsopi: Uchar.t = \u{3C0}
(for entry of chars by their hex codepoint.)
It’s gross, but finding something less gross seems hard…
This convention could be implemented in a printer as:
let pp_uchar f uc =
let ui = Uchar.to_int uc in
if (ui > 31 && ui < 127) || (ui = 9) || (ui = 10) || (ui = 13)
then Format.fprintf f "\\u'%s'" (Char.escaped (Uchar.to_char uc))
else Format.fprintf f "\\u{%x}" ui
Okay, so I started looking at implementing this in ppx_deriving.show and realized that I really should be proposing implementing it in Printf so it could be standard across OCaml, but for the moment, I'm not sure I entirely understand the code in ppx_deriving_show.cppo.ml in the expr_of_type function. Do I have the right place, though? That seems to be where this would below.
Yes, I believe that is the right place to implement support for Uchar.t.
One question is, should I start there, or should I propose something for Printf or even the lexer?
Why not both? The stdlib/compiler changes will likely take a lot of time.
I suppose the "why not both" is that I've gotten gunshy about proposing stdlib and compiler changes, but yah, I suppose I should start somewhere.
Is the proposed syntax tasteful enough to you?
@pmetzger I like the syntax.