dune Menhir with external tokens in ML file

There are some situations where we need tokens to not live in the parser.mly file, and when you need to use a PPX for token functions, for example, having a tokens.mly is not a good idea. Take the following example:

token.ml

type token =
  | FOO of int
  | BAR
  [@@deriving show]

parser.mly

%start <int> main

%%

main:
| i = FOO BAR { i }

So:

(menhir
  (flags (--external-tokens Token))
  (modules (parser)))

(library
  ...
  (modules (parser token)))

Won't build because a requirement for building the parser is having the token precompiled, but the Menhir rules seem to be executed before.

I think providing a way to explicitly say that the parser needs the token module could solve, or we can do a lot of workarounds with .cppo files to derive the tokens only in some situations, what wouldn't be easy for beginners.

Mar 29 '18 20:03 haskellcamargo

This is similar to #305. We started looking at the other PR more in detail and there is still one more thing to change in jbuilder before we can support either menhir --infer or this feature.

Mar 29 '18 20:03 ghost

Given that we have support for --infer now, this shouldn't be hard to add either. This is a low priority feature for the dev team, so it likely won't be done without external help. If anyone is willing to help out, please reach out and we can give some guidance.

Nov 06 '18 20:11 rgrinberg

I'd love to have this. How can I help?

Mar 24 '19 15:03 Leandros

Won't build because a requirement for building the parser is having the token precompiled, but the Menhir rules seem to be executed before.

What does it mean to have the token "precompiled"? Do we need the cmi for the Token module?

Once this is clarified I can write some pointers on how to implement this.

Mar 25 '19 16:03 rgrinberg

Polite bump. I don't know much about OCaml infrastructure but the error message when running

(menhir
 (flags --external-tokens Token)
 (modules "parser"))

is Error: Unbound constructor MY_TOKEN where MY_TOKEN is some constructor obviously. Menhir just generates ml and mli files with lines type token = Token.token and the ml file refers to Token.MY_TOKEN (etc). So per this answer I am led to believe the cmi is what's needed, yes.

Apr 24 '20 18:04 dunnl

Another bump; this is important to (several of?) my project(s); and dropping external orchestration and tooling to let Dune run the whole build would be massively beneficial.

(While I'm here: anybody have a good workaround, even if it's janky?)

Sep 01 '22 18:09 ELLIOTTCABLE

@dunnl could you give a minimal example with the unbound constructor error? We run ocamldep on the generated module, so the dependency on Token should be picked up.

Sep 03 '22 19:09 rgrinberg

+1 -- ability to add something like [@@deriving show] very useful here

Jan 14 '23 00:01 aryeh-looker

Curious if @haskellcamargo or others have found a suitable workaround (especially for the purposes of deriving show with ppx_deriving)

Jan 14 '23 00:01 aryeh-looker

It would help if you would explain the issue more. I don't see why doing (flags --external-tokens Token) not work.

Jan 16 '23 01:01 rgrinberg

In particular a small dune project to reproduce the issue is all I need.

Jan 16 '23 01:01 rgrinberg

Note that the original example is wrong. parser.mly should be:

%token <int> FOO
%token BAR

%start <int> main

%%

main:
| i = FOO BAR { i }

You still need to declare the tokens in the .mly file. But you will see that menhir generates the necessary type equality to use your external module: type token = Token.token

Jan 16 '23 01:01 rgrinberg

If it contributes to understanding this issue: here's the relevant orchestration from one of the projects I was talking about above — mind you, this is ancient now, so I have my doubts that it'll be helpful: https://github.com/ELLIOTTCABLE/excmd.js/blob/3cb1ef3bd8c41a3505776f57209588ae96955ea6/packages/bs-excmd/bsconfig.json#L62-L68

Here's the script that's applying to the tokens.generated.ml{,i}: https://github.com/ELLIOTTCABLE/excmd.js/blob/main/packages/bs-excmd/scripts/annotateMenhirTypes.ml

Extracting some details from my (very fuzzy) memory and sparse docs, there was a need to control the annotations on the type token declaration — in that particular case, it needed something like:

type token =
   [@@bs.deriving jsConverter] [@@deriving show { with_path = false }, to_yojson { \
         optional = true }]
   (* ... *)

AFAIK, though, this is still an issue — unless I've missed recent development, there's still no way to control the produced tokens.ml module, besides manually 1. breaking the menhir invocation into two separate --only-tokens and --external-tokens invocations, and then 2. using external automation to modify the produced tokens between those steps?

(If this is solved, I'd posit that it's a common enough issue that it perhaps deserves a mention in the README — i.e. "if you want [@@deriving show] on your Token module, "!)

Jan 16 '23 23:01 ELLIOTTCABLE

Yes, I tried it and --external-tokens works as expected.

If there's something to clarify, then it's the menhir manual. There's an expectation that adding --external-tokens relieves the programmer of defining the tokens in the .mly as the type definition now exists in OCaml. That's not the case and the only thing --external-tokens helps with is establishing the type equality between the token in the external module and the token in .mly file.

Jan 17 '23 00:01 rgrinberg

Apologies for the initial +1! Shortly afterwards, I too was able to get this (highly useful) feature to work and meant to loop back to report as such. Thank you @rgrinberg.

@ELLIOTTCABLE -- it's outside the scope of ocaml/dune, but on reading the ppx deriving manual, I noticed that there are ways to effectively get show at a call-site when you have an object in hand that does not implement any @deriving annotations. I don't have references handy at the moment, but with some googling should be findable.

There's also this clever library (not sure how sound it is, sadly, but looks quite cool) which introduces ways to "re-open"/modify an existing module; e.g. adding type t = _ [@@deriving show] to your token type. https://github.com/thierry-martinez/override

Jan 18 '23 05:01 aryeh-looker

dune dune copied to clipboard

Menhir with external tokens in ML file

dune
dune copied to clipboard