RFCs icon indicating copy to clipboard operation
RFCs copied to clipboard

[RFC] unit headers for OCaml source files

Open gasche opened this issue 4 years ago • 12 comments

The RFC stems from (old thinking/discussions and) the discussion of https://github.com/ocaml/ocaml/pull/10319 . The RFC text is reproduced below.


Unit headers for OCaml source files

Context

In OCaml, source files play a double role:

  • They are interpreted inside the language as modules, formed by sequence of structure items. (Modules can be nested, but a file always acts as a toplevel module.)

  • They are interpreted by the compilation tools as "compilation units", the primary units of compilation and linking, whose dependencies on other compilation units are tracked and whose linking order determines the program semantics.

    (Technically a compilation unit is formed by a pair of a .ml and a .mli file, but sometimes only one of them when the other does not exist.)

Some things that OCaml programmers can express only make sense for compilation units, not modules. Currently they can only be expressed through compiler command-line options, typically stored in the build system. For example:

  • Dependencies on other compilation units or (in general) archives/libraries/packages.
  • Global compilation options (-safe-string, -rectypes).

Sometimes it would be convenient, even important, to specify those aspects in the source code itself, but there is no place in the syntax to specify them: they are not valid structure items as they don't make sense inside an arbitrary (nested) module.

One example of the problem

One example use-case is [@@@warning "-missing-mli"]: we would like to let users explicitly disable the new missing-mli warning (introduced by #9407 in 4.13~dev) inside a particular .ml file, indicating that it intentionally does not have a corresponding .mli file.

This warning is implemented at the level of compilation units, not during the checking/compilation of the module code, so the current implementation of [@@@warning ..] does not support disabling it: it only enables/disable warnings for the following structure items in the current module.

A proposal exists to change the semantics of toplevel @@@warning attributes to remain in scope for the whole checking/compilation of the compilation unit, see #10319. This is a special case of one the two options discussed in this RFC, and it led to the present discussion.

Proposals

Two proposals to address this issue, one "implicit" and one "explicit".

The [@@@warning "-missing-mli"] PR implements the implicit proposal.

I prefer the explicit proposal.

Implicit proposal: handle toplevel attributes/extensions at the compilation unit level

We could consider that floating attributes and extensions that are at the toplevel of a file are not interpreted as "normal" structure items, at the level of the module, but instead as "unit" attributes/extensions at the level of the unit.

let foo = ...

[@@@warning "-missing-mli"] (* warning setting for the whole compilation unit *)

module Foo = struct
  [@@@warning "..."] (* warning setting for a submodule only
end

Pros:

  1. Reasonably easy to implement, no syntax change (we reinterpret syntax differently).

  2. This is consistent with the way toplevel directives #foo ;; are handled today: toplevel directives are only valid at the toplevel, but can be mixed with other structure items.

Cons:

  1. Confuses two notions.

2 We lose the current property that any OCaml code can be moved inside a submodule, preserving its meaning.

  1. We cannot hope to extend this idea in the future to support global settings, such as -rectypes, because it would be a mess to allow those to change in the middle of other structure items.

Variant

One possible variant of this proposal would be to specify certain attributes/extensions as "header attributes", that have the same syntax as floating structure/signature-level attributes/extensions, but can only be used at the beginning of the file (before any non-header construct). This solves Cons.3, but aggravates Cons.1 by creating more surprises for users (certain toplevel floating attributes can be moved around and other not, etc.).

Explicit proposal: create a "header" extension for compilation-unit configuration

Instead of implicitly treating toplevel attributes/extensions as scoping over the whole compilation-unit, we propose a builtin ocaml.unit_header extension whose content should be understood as scoping over the whole compilation unit, not just a module.

[%%unit_header
  [@@@warning "-missing-mli"]
  [@@@rectypes]
]

let foo = ...

"unit headers" must be before any other structure/signature items (comments are allowed before headers). They are the only component of the .ml syntax that cannot be moved into a submodule (doing so results in an error).

Note: this RFC does not propose a new @@@rectypes attribute to be supported here, it is an example of the sort of feature that could, over time, become available in unit headers. [@@@warning "-missing-mli"] would be immediately adapted to work (only) in unit headers, but the RFC itself proposes the "header" notion itself, and not any specific item to be part of it.

Future extensions

In the future, certain toplevel directives could be allowed in the unit header. This is not proposed here.

We could imagine certain tools querying the unit header of source files for configuration (or querying the compiler to ask for them), for example to support a #require ... directive integrated in the build system. This is not proposed here, and in fact not necessarily the best approach. I think it probably makes more sense to reserve the header for aspects of OCaml programs that the compiler knows about (that correspond to command-line options), so that header interpretation is left entirely in the compiler. If someday we have a compiler that handles dependencies (and/or ppx resolution, etc.) by itself, then those aspects would become naturally specifiable in unit headers.

gasche avatar Apr 17 '21 13:04 gasche

Another point that might be worth investigating is the integration with the functor unit RFC #11 . It seems that functor units would benefit from the possibility of communicating information about the compilation units in the file themselves.

Octachron avatar Apr 19 '21 10:04 Octachron

Another data point: we already have header attributes for alerts (that I had forgotten). For instance, with

(* a.mli *)
[@@@alert Scylla]
module M: sig [@@@alert Charibdys] end

the Scylla alert is attached to the A module (or compilation unit?), but the Charibdys alert is not attached to M. If we had an explicit header, we could avoid some confusion.

Octachron avatar Feb 24 '22 10:02 Octachron

I must admit that I didn't push for this RFC in any particular way. Florian, if you are willing to openly declare yourself as a supporter, maybe we should start lobbying around? Is there a concrete proposal that we both like that we could push specifically to start a conversation, for example [%%unit_header ...]?

gasche avatar Feb 24 '22 14:02 gasche

What is your stand on extending the existing syntax in an explicit way by naturally adding an extra @ to indicate the extended scope? E.g., using x4 @ for the CU-level annotations,

[@@@@warning "-missing-mli"]
[@@@@rectypes]

I personally find this easier to remember and use rather than

[%%unit_header
  [@@@warning "-missing-mli"]
  [@@@rectypes]
]

which requires us (OCaml users) to remember an extra bit of syntax and protocol.

Are there any specific reasons why we need to pack CU annotations in blocks?

ivg avatar Feb 24 '22 16:02 ivg

The advantage of the %%unit_header proposal is that it requires no syntactic change. This said, I think the proposal of using a new level for unit-global annotation is also fine -- as long as, like %%unit_header, we forbid using those annotations after any other kind of structure item. And it does look a bit nicer on the eye.

gasche avatar Feb 24 '22 21:02 gasche

My main concern with [@@@@ ...] is that it begs the question of where do we stop, will we one day need [@@@@@ ... ] or [@@@@@@ ...] attributes? Maybe pack level annotations should be denoted with [@@@@@ ... ] and library level with [@@@@@@ ... ]? (and don't forget that I cannot(don't want) to count beyond 7 in unary without pretty pictograms like 𓆼).

Nevertheless, I agree that by itself [@@@@ ...] is alright and probably slightly nicer than a [%%header ... ] node in isolation.

Octachron avatar Feb 24 '22 21:02 Octachron

Yes, the backward compatibility concern totally justifies it. For some reason, I was thinking that the parser will accept [@@@@foo] and just ignore it. I also agree with Florian, that after some number of repetitions the syntax becomes ugly and unreadable.

With that said, I am still not sure that the language really needs this. The build system things should be handled by the build systems. E.g., we had the nice notion of tags in ocamlbuild for that. Something similar could be easily implemented in dune. Unfortunately, the current state of compilers (not just OCaml, the whole industry of compilers) still forces us to have various pragmas for optimization and inlining and so on, but I really like that OCaml was (and still quite is) free of them so that the source code is about the program and its logic that keeps it maximum declarative. I am afraid that this change might open the pandora box, after which we will start specifying dependencies and other build system-related stuff in the source code.

ivg avatar Feb 25 '22 17:02 ivg

Apparently a similar Rust RFC has just been approved. Their main use-case seems to be single-file programs (cargo script), rather than module-specific data in a larger program.

gasche avatar May 05 '24 05:05 gasche

Note that there seem to be quite a bit of overlap with existing # ocaml directives which I always found a pity that ocamlc/ocamlopt refuses to parses (if only to ignore).

For example in b0caml, a native scripting system for OCaml, I tried to use (and repurpose) them in an initial preamble (the #directory primitive was the equivalent of #require, something designed in a time where I was optimistic we could get to something simple on the library handling front). Another example is B0.ml files of the b0 build system which adds both # and a few B0 specific directives in order to describe how the build description should be compiled.

dbuenzli avatar May 05 '24 06:05 dbuenzli

I had a brief discussion about this with @Octachron a few days ago, and we agree that accepting a preamble of toplevel directives would be nice. This would push us to add interesting directives that may be missing today (systematically reflect relevant command-line flags as directives), which could also help other users.

It should be easy to parse the preamble for external tools, and in particular to know where it ends. I wonder if we could make explicit ;; mandatory in that part of the source file.

gasche avatar Jun 05 '24 12:06 gasche

Over at https://github.com/ocaml/ocaml/pull/13471#issuecomment-2383491990, @stedolan discussed the need for something in the spirit of this RFC:

There are three reasonable places to specify they keyword set / language edition in use:

  1. On the compiler command line (this PR, proposed in RFC 27)
  2. In the file, as a toplevel attribute or lexer directive (not in this PR, but also proposed in RFC 27)
  3. Globally, as parameters to configure or to the opam switch (neither in this PR nor RFC 27)

I now think it is a mistake to provide only (1). It might even be a mistake to provide (1) at all.

The problem is that there are lots of tools other than the compiler which care what a keyword is: some preprocessors, merlin, ocamlformat, ocp-indent, editor syntax highlighting, etc. In many cases, the tool is given the text of the file, and there is no obvious way to communicate other information from the build. This can work with mechanism (2) (since the tool can see the attribute or directive) and mechanism (3) (since the tool is configured at install time), but not with mechanism (1).

Providing mechanism (1) as a fallback can be useful for when you want to compile old code without modification, where you accept that tooling won't work properly on said code. But it should at least not be the only provided mechanism.

We had this experience on the Jane Street branch when we first added locals. Initially, we used mechanism (1), a command-line flag, and regretted it. In that context, there is only one build system and a limited number of editors / syntax highlighters, and it was still painful.

"Toplevel attribute or lexer directive" seems fairly close to the proposals we made about to use a dedicated extension, or to use toplevel(REPL) directives. (I think that "toplevel attribute" just means "an attribute at the beginning of the file", not toplevel in the REPL sense.)

gasche avatar Sep 30 '24 15:09 gasche