ppx_deriving icon indicating copy to clipboard operation
ppx_deriving copied to clipboard

show doesn't have a handler for Uchar.t

Open pmetzger opened this issue 7 years ago • 25 comments

show doesn't have a built-in handler for Uchar.t, which makes it messy to try to derive a show function for something containing a Uchar.t.

pmetzger avatar Oct 05 '18 00:10 pmetzger

(BTW, what's a reasonable temporary workaround?)

pmetzger avatar Oct 05 '18 01:10 pmetzger

In general, if there's a type without an existing deriver you can create a type alias and define your own deriver. In the case of show you'd need a pretty printer for the type. For example, something very close to the following should work if you fill in an appropriate definition for pp_uc.

type uc = Uchar.t
let pp_uc pp uc = ...
type t = { u : uc } [@@deriving show]

hcarty avatar Oct 05 '18 03:10 hcarty

Please submit a PR for Uchar.t, this would be very welcome.

whitequark avatar Oct 05 '18 09:10 whitequark

I can submit a PR for Uchar.t, but we need to decide on a printed representation that we're comfortable with. (That is to say, can someone suggest a color for the bikeshed?)

pmetzger avatar Oct 05 '18 12:10 pmetzger

Format.printf "U+%04X" codepoint

whitequark avatar Oct 05 '18 12:10 whitequark

Or maybe Uchar.of_char '%c' for printable and Uchar.of_int 0x%04X for non-printable...

whitequark avatar Oct 05 '18 12:10 whitequark

That latter suggestion isn't far from what @Drup suggested, though figuring out what's printable given only the tools in the stdlib isn't easy, so it might need to default to the latter.

pmetzger avatar Oct 05 '18 12:10 pmetzger

U+HHHH seems problematic because it can't be a valid OCaml read syntax ever, at least not without nasty special casing.

pmetzger avatar Oct 05 '18 12:10 pmetzger

You can conservatively stick to ASCII. 0x20..0x7E?

whitequark avatar Oct 05 '18 12:10 whitequark

You can conservatively stick to ASCII. 0x20..0x7E?

Sorry, can you expand on that?

pmetzger avatar Oct 05 '18 12:10 pmetzger

@hcarty btw, in your example:

let pp_uc pp uc = ...

I am guessing the uc is the actual Uchar.t, but what's the pp argument?

pmetzger avatar Oct 05 '18 12:10 pmetzger

Sorry, can you expand on that?

I mean, convert it to int and check that it's between 0x20 and 0x7E. That should cover all printable ASCII range...

whitequark avatar Oct 05 '18 13:10 whitequark

My current prototype is:

let pp_uchar f uc =
  let ui = Uchar.to_int uc in
  if ui < 128
  then Format.fprintf f "(Uchar.of_char '%s')" (Char.escaped (Uchar.to_char uc))
  else Format.fprintf f "(Uchar.of_int 0x%04x)" ui

pmetzger avatar Oct 05 '18 14:10 pmetzger

You need to skip 0x00 to 0x1F inclusive, too.

whitequark avatar Oct 05 '18 14:10 whitequark

Why? Tab is better expressed as '\t' etc.

pmetzger avatar Oct 05 '18 14:10 pmetzger

(I can see suggesting that things other than \t, \n, \r etc. should be expressed in the "normal" way though.)

pmetzger avatar Oct 05 '18 14:10 pmetzger

Oh sorry, I missed Char.escaped. You are right.

whitequark avatar Oct 05 '18 15:10 whitequark

One suggestion for an OCaml Uchar.t direct syntax that’s evolved on the discord channel:

let pi : Uchar.t = \u'π'

(for direct entry of Unicode chars in source)

and

let alsopi: Uchar.t = \u{3C0}

(for entry of chars by their hex codepoint.)

It’s gross, but finding something less gross seems hard…

pmetzger avatar Oct 05 '18 18:10 pmetzger

This convention could be implemented in a printer as:

let pp_uchar f uc =
  let ui = Uchar.to_int uc in
  if (ui > 31 && ui < 127) || (ui = 9) || (ui = 10) || (ui = 13)
  then Format.fprintf f "\\u'%s'" (Char.escaped (Uchar.to_char uc))
  else Format.fprintf f "\\u{%x}" ui

pmetzger avatar Oct 05 '18 19:10 pmetzger

Okay, so I started looking at implementing this in ppx_deriving.show and realized that I really should be proposing implementing it in Printf so it could be standard across OCaml, but for the moment, I'm not sure I entirely understand the code in ppx_deriving_show.cppo.ml in the expr_of_type function. Do I have the right place, though? That seems to be where this would below.

pmetzger avatar Oct 14 '18 18:10 pmetzger

Yes, I believe that is the right place to implement support for Uchar.t.

whitequark avatar Oct 29 '18 06:10 whitequark

One question is, should I start there, or should I propose something for Printf or even the lexer?

pmetzger avatar Oct 29 '18 13:10 pmetzger

Why not both? The stdlib/compiler changes will likely take a lot of time.

whitequark avatar Oct 29 '18 13:10 whitequark

I suppose the "why not both" is that I've gotten gunshy about proposing stdlib and compiler changes, but yah, I suppose I should start somewhere.

Is the proposed syntax tasteful enough to you?

pmetzger avatar Oct 29 '18 18:10 pmetzger

@pmetzger I like the syntax.

whitequark avatar Nov 08 '18 04:11 whitequark