ocaml icon indicating copy to clipboard operation
ocaml copied to clipboard

Add interface and implementation types that contain source location

Open malekbr opened this issue 2 years ago • 14 comments

This is a second attempt for #12629 , after #12643. It creates new parsetree interface and implementation types, and adds location information to these types. Existing tests that passed before still pass or are fixed to pass. Additional changes including adding an interface type to the typedtree, and a bunch of changes to the mlis that used to target the structure/signature types to target the interface/implementation types. Open questions/things to be determined:

  • Are the changes to the mlis proper, or are structure/signature better types to pass for some of these.
  • I changed the typedtree implementation type to have an impl_ prefix to make it more consistent with the remaining types and to disambiguate. Is that the proper thing to do?
  • I did not change the cm* formats to use interface/implementation because that felt outside the scope of this change, and would lead to data redundancy in the file without trimming other fields.
  • Should more tests be added?
  • Changes file is not updated.

malekbr avatar Oct 12 '23 00:10 malekbr

Thanks a lot for the thorough feedback! Going through it!

malekbr avatar Oct 14 '23 20:10 malekbr

The changes are a bit intertwined, the code assumes that signature = interface and structure = implementation in a lot of places. From an initial run through the code, we'd either need to duplicate some code to maintain the distinction for ppxes but not for the rest of the code, or add extra arguments to make the existing the code generic. An example is the Pparse.ast_kind type, which is used for both ppx rewriters and and the loading of files for the regular use of the compiler.

malekbr avatar Oct 14 '23 22:10 malekbr

For Pparse I wonder (but I have not tried) whether it is possible to keep most of the file working on structure and signature as before, and have just the toplevel functions parse_implementation and parse_interface (which are already not generic) return the located types. (The alternative is indeed to make ast_kind a two-parameter type, for example Structure : (structure, implementation) ast_kind, but bleargh.)

gasche avatar Oct 15 '23 05:10 gasche

For Pparse, I actually tried to make the change but didn't push it, and I think having two-parameter types is the least ugly solution if we want to preserve the behavior for the rewriters. But as you said, bleargh. It's about maintaining a balance between code simplicity and the stability of the interface we provide to ppx rewriters, and I don't know where that balance lies. The original reason why I went for the full change is, most changes to the AST incur costs to ppx maintainers, with libraries like ppxlib trying to reduce that cost. As a change to the AST, the ppxes would have to change anyway, and hopefully ppxlib would mitigate that. But in this case, the fact that we're renaming the entrypoint type might be too much.

malekbr avatar Oct 15 '23 18:10 malekbr

Another thing that makes me nervous with the pparse change is that some input modes of the compiler take serialized AST instead of source file, and I am not sure that your change preserves the format of the serialized AST. I see a potential for ugly bugs (eg. compiler segfaults) if we silently change the format without bumping magic numbers, etc. Not touching that part at all would be much more reassuring compatibility-wise.

gasche avatar Oct 15 '23 18:10 gasche

Also, I don't know if the change is actually useful for people that uses ppxes or provide serialized AST. Would they benefit from the extra location information, or is it already passed out of band? For ppxes I think it is part of the cookies, but not sure; I guess that this may not be the case for other producers of serialized AST (camlp5?), but this is a super-niche use case outside ppxes anyway. If there are no user benefits to the change, it is even less tempting to risk breaking compatibility.

gasche avatar Oct 15 '23 18:10 gasche

@gasche, any changes to the parsetree change the compatibility with the serialized ASTs. And there is already parsetree changes in trunk, there are no reason to try to keep compatibility with serialized ASTs.

Octachron avatar Oct 16 '23 08:10 Octachron

My reasoning not that I expect binary compatibility with older AST formats. What I have in mind is a user program that links against compiler-libs, and produces a serialized AST "manually" by doing something akin to output_value (ast : Parsetree.structure) (and also accessing Config.ast_impl_magic_number, etc.). If they update their program to trunk after the present PR is merged, the type-checker will ask them to update their code to the new parsetree definition (as expected), and then it will compile, but the compiler will segfault because we changed our binary-AST protocol to expect a distinct type Parsetree.implementation.

(Note: I looked at pparse again, and indeed the binary ast format includes the source file name as a separate payload, so it does not need the change. This being said, we could remove this tweak if we moved to implementation/interface, so the code would be slightly nicer / more regular.)

gasche avatar Oct 16 '23 08:10 gasche

I believe the English idiom is "lo and behold":

https://github.com/camlp5/camlp5/blob/f776d83014526f654b717e43ed408ec60702aa6b/meta/pr_dump.ml#L19-L27

gasche avatar Oct 16 '23 08:10 gasche

I tried the two argument ast_kind solution, and it's not as bad as I thought it would be. What should I go with for the typedtree prefix?

malekbr avatar Oct 22 '23 19:10 malekbr

@malekbr current status:

  • You will need to rebase over #12684, which is a bugfix for a pre-existing bug I found while reading your code; apologies...
  • @Octachron is also in favor of not adding new prefixes in typedtree records that were unprefixed, so let's try to move to that (the field name remains to be decided, and the amount of changes to do depends on the choice)
  • I proposed storing only a filepath instead of a full location. (Storing a location made sense for the original design of locating all structures and signatures, less so now that we only need to locate full files.) This could help with the choice of a good field name that does not introduce ambiguities.

I hacked a bit on top of your PR again to see if it was possible to simplify the changes to pparse.ml. The result is available at https://github.com/gasche/ocaml/commits/add-interface-and-implementation-types-minimize-pparse.ml-changes . In this branch, the changes to driver/pparse.ml{,i} are really minimal. I would be rather in favor of this or a similarly-minimal approach.

gasche avatar Oct 24 '23 15:10 gasche

I'm currently in the process of rebasing this PR so that we can move it forward. I cannot push to @malekbr's branch, so I pushed the new version in a new branch on my fork:

https://github.com/gasche/ocaml/tree/add-locations-to-implementations-and-interfaces

I still have some testsuite changes to iron out, but I hope to open a new PR for review shortly.

gasche avatar May 15 '25 12:05 gasche

I am done rebasing the branch and I just updated the current PR.

gasche avatar May 15 '25 20:05 gasche

I remain concerned with the backward-compatibility impact of this change. Here is the current state of the PR, as far as I understand it.

  1. Before we used interface and signature mostly interchangeably, same for implementation and structure.
  2. Now they are distinct type: implementation/interface carry location information, structure/signature do not.
  3. Whenever a function foo_structure was exposed that expected a structure, we now offer both foo_structure (that expects a structure) and foo_implementation (that expects an implementation); this is backward-compatible
  4. But other places would expose foo_implementation and expect a structure. Here one could keep foo_implementation that expects a structure and define foo_implementation' that expects an implementation. Instead what this PR does it to change foo_implementation to expect an implementation, and add foo_structure to expect a structure. This breaks backwards-compatibility -- and there is no obvious way to write compiler-libs-using code that is compatible with both older OCaml versions and trunk.

For example, manual/tests/cross_reference_checker.ml has code that parses a .ml file with Parse.implementation, and then processes it with iterator.structure. This code was correct before and now it breaks, because the types are now incompatible. The solution is either to use Parse.structure (and keep iterator.structure), or to (keep Parse.implementation and) use iterator.implementation. Neither changes are compatible with previous compiler-libs interface.

My impression is that the breakage to be expected are relatively few and far between, and that people can deal with that -- we never said compiler-libs was stable. But they may need version-conditional preprocessing, which is meh. My hope is that no change is needed for people using the ppx machinery (I tried to make sure that everything used there remains available), and that those are the main compiler-libs user.

gasche avatar May 18 '25 11:05 gasche

I believe that a second opinion would be useful to move this forward (merge it, I hope, or close it); maybe @Octachron? (I cannot credibly act as a reviewer myself now that I started modifying the code in more substantial ways.)

gasche avatar Jul 24 '25 04:07 gasche

A point to keep in mind in term of backward compatibility is that nearly all libraries that heavily depend on compilerlibs already relies on preprocessing and compatibility modules.

From this perspective, the fact that none of the toplevel interfaces are affected seems like a good sign that the change is not that visible from light users of compilerlibs.

Octachron avatar Jul 24 '25 09:07 Octachron

I was wondering why there were not changes in the testsuite, but it seems a good part of the initial problem was solved by the switch to Unit_info in #12389 which computes the interface source name from the ml filename.

Octachron avatar Jul 25 '25 15:07 Octachron