dune
dune copied to clipboard
Menhir with external tokens in ML file
There are some situations where we need tokens to not live in the parser.mly
file, and when you need to use a PPX for token functions, for example, having a tokens.mly
is not a good idea. Take the following example:
-
token.ml
type token =
| FOO of int
| BAR
[@@deriving show]
-
parser.mly
%start <int> main
%%
main:
| i = FOO BAR { i }
So:
(menhir
(flags (--external-tokens Token))
(modules (parser)))
(library
...
(modules (parser token)))
Won't build because a requirement for building the parser is having the token
precompiled, but the Menhir rules seem to be executed before.
I think providing a way to explicitly say that the parser needs the token module could solve, or we can do a lot of workarounds with .cppo
files to derive the tokens only in some situations, what wouldn't be easy for beginners.
This is similar to #305. We started looking at the other PR more in detail and there is still one more thing to change in jbuilder before we can support either menhir --infer
or this feature.
Given that we have support for --infer
now, this shouldn't be hard to add either. This is a low priority feature for the dev team, so it likely won't be done without external help. If anyone is willing to help out, please reach out and we can give some guidance.
I'd love to have this. How can I help?
Won't build because a requirement for building the parser is having the token precompiled, but the Menhir rules seem to be executed before.
What does it mean to have the token "precompiled"? Do we need the cmi for the Token module?
Once this is clarified I can write some pointers on how to implement this.
Polite bump. I don't know much about OCaml infrastructure but the error message when running
(menhir
(flags --external-tokens Token)
(modules "parser"))
is Error: Unbound constructor MY_TOKEN
where MY_TOKEN
is some constructor obviously. Menhir just generates ml
and mli
files with lines type token = Token.token
and the ml
file refers to Token.MY_TOKEN
(etc). So per this answer I am led to believe the cmi is what's needed, yes.
Another bump; this is important to (several of?) my project(s); and dropping external orchestration and tooling to let Dune run the whole build would be massively beneficial.
(While I'm here: anybody have a good workaround, even if it's janky?)
@dunnl could you give a minimal example with the unbound constructor error? We run ocamldep on the generated module, so the dependency on Token
should be picked up.
+1 -- ability to add something like [@@deriving show]
very useful here
Curious if @haskellcamargo or others have found a suitable workaround (especially for the purposes of deriving show
with ppx_deriving
)
It would help if you would explain the issue more. I don't see why doing (flags --external-tokens Token)
not work.
In particular a small dune project to reproduce the issue is all I need.
Note that the original example is wrong. parser.mly
should be:
%token <int> FOO
%token BAR
%start <int> main
%%
main:
| i = FOO BAR { i }
You still need to declare the tokens in the .mly file. But you will see that menhir generates the necessary type equality to use your external module: type token = Token.token
If it contributes to understanding this issue: here's the relevant orchestration from one of the projects I was talking about above — mind you, this is ancient now, so I have my doubts that it'll be helpful: https://github.com/ELLIOTTCABLE/excmd.js/blob/3cb1ef3bd8c41a3505776f57209588ae96955ea6/packages/bs-excmd/bsconfig.json#L62-L68
Here's the script that's applying to the tokens.generated.ml{,i}
: https://github.com/ELLIOTTCABLE/excmd.js/blob/main/packages/bs-excmd/scripts/annotateMenhirTypes.ml
Extracting some details from my (very fuzzy) memory and sparse docs, there was a need to control the annotations on the type token
declaration — in that particular case, it needed something like:
type token =
[@@bs.deriving jsConverter] [@@deriving show { with_path = false }, to_yojson { \
optional = true }]
(* ... *)
AFAIK, though, this is still an issue — unless I've missed recent development, there's still no way to control the produced tokens.ml
module, besides manually 1. breaking the menhir
invocation into two separate --only-tokens
and --external-tokens
invocations, and then 2. using external automation to modify the produced tokens between those steps?
(If this is solved, I'd posit that it's a common enough issue that it perhaps deserves a mention in the README — i.e. "if you want [@@deriving show]
on your Token module,
Yes, I tried it and --external-tokens
works as expected.
If there's something to clarify, then it's the menhir manual. There's an expectation that adding --external-tokens
relieves the programmer of defining the tokens in the .mly as the type definition now exists in OCaml. That's not the case and the only thing --external-tokens
helps with is establishing the type equality between the token
in the external module and the token in .mly
file.
Apologies for the initial +1! Shortly afterwards, I too was able to get this (highly useful) feature to work and meant to loop back to report as such. Thank you @rgrinberg.
@ELLIOTTCABLE -- it's outside the scope of ocaml/dune
, but on reading the ppx deriving manual, I noticed that there are ways to effectively get show at a call-site when you have an object in hand that does not implement any @deriving
annotations. I don't have references handy at the moment, but with some googling should be findable.
There's also this clever library (not sure how sound it is, sadly, but looks quite cool) which introduces ways to "re-open"/modify an existing module; e.g. adding type t = _ [@@deriving show]
to your token type. https://github.com/thierry-martinez/override