analyzer
analyzer copied to clipboard
Migrate from CIL's `Pretty` to OCaml's `Format`
In #210 it came up multiple times that OCaml's standard Format
module could replace Pretty
from CIL with additional benefits:
- It's maintained in OCaml's standard library while
Pretty
isn't properly maintained (maybe we should do this migration even in our CIL fork eventually). - It has much more features for producing pretty output.
- It's supposed to be faster by not constructing
Pretty.doc
structures but writing to a formatter directly (a la howPrintable.printXml
currently works). - It's supposed to be faster by using GADTs for internal implementation instead of
Obj
(https://github.com/goblint/analyzer/issues/210#issuecomment-831395278). -
ppx_deriving.std
hasderiving show
that also derives a pretty-printer in addition to a simpleshow
(https://github.com/goblint/analyzer/issues/210#issuecomment-831395278). - It allows custom semantic tags, e.g. for colors, and their behavior can depend on where the output is going (to the terminal or to a string).
TODO
- [ ] Replace
Printable.pretty
withval pp: Format.formatter -> t -> unit
. This is the de facto standard naming for these pretty printer andderiving show
also derives this signature. - [ ] Rewrite every implementation of
pretty
aspp
. - [ ] Rewrite every
Pretty.printf
withFormat.printf
by usingpp
instead ofpretty
. - [ ] Rewrite tracing system to use
Format
instead ofPretty
. - [ ] Rewrite every
trace
by usingpp
instead ofpretty
. - [ ] Implement
pp
for CIL types, e.g.pp_exp
to replaced_exp
. CIL has a lot ofPretty
-based code to pretty-print everything (including entire programs). We don't need all of it though unless we also decide to migrate CIL itself.
maybe we should do this migration even in our CIL fork eventually
I think we should probably do this at the same time, because otherwise we will be relying on Pretty.sprint
for the output of CIL types to strings which makes for strange code:
Pretty.sprint ~width:80 (Cil.d_exp () exp)
For an initial migration in Goblint, we could still do that, but maybe with ~width:max_int
so Pretty
doesn't add line-wrapping newlines. I think the CIL pretty printers for expressions and other smaller things don't actually make use of Pretty.break
and other fancy features anyway. it's just that CIL has lots of Pretty
code for statements, functions and entire programs, which we don't really need, but would need to be rewritten for a complete CIL migration. It of course would be good for keeping our CIL fork somewhat modern, but maybe not so high priority for us.
But of course for general use of Format
it's a very bad style to construct intermediates as strings, because it doesn't allow Format
to decide about nice line breaks and the inner forced string doesn't know how indented it is or what the outmost width is.
The general principle should be that the pretty-printing width is only determined at the outermost level (e.g. where printf
actually happens) and everything else automagically gets adapted to that. It would allow for nice things like having the pretty printing be based on actual terminal width instead of enforcing an annoying 80 character width with tons of newlines even when you look at it in a wider terminal. For non-terminal output it could be unlimited (max_int
) or configurable.
Here's an interesting idea, which might make migration easy, but I don't know if it's possible:
Define our own Pretty
module with type doc = Format.formatter -> unit
and a matching dprintf
to be a complete drop-in implementation, except for the CIL stuff.
Probably not due to some %a
typing reason. Pretty
has them of type unit -> t -> doc
, but Format
has them of type formatter -> t -> unit
. The above suggestion would make our definitions of type unit -> t -> formatter -> unit
, which wouldn't work for Format
. I guess a drop-in implementation would only work if a custom fprintf
-like thing is defined to do some argument flipping before using Format.fprintf
.
I tried really hard to achieve this, but I'm fairly sure now that such drop-in dprintf
cannot be done. Even though OCaml's format stuff is super involved (e.g. this GADT with 12 type parameters) and some of the internals are exposed via CamlInternal*
modules, the key parts that actually call the function for %a
are in the implementation of Format
etc but not exposed by their signatures.