RFCs icon indicating copy to clipboard operation
RFCs copied to clipboard

Implicit source positions

Open OlivierNicole opened this issue 9 months ago • 77 comments

Rendered version

Introduction of the RFC:

A new type of arrow is introduced. Syntactically, it looks like:

val f : loc:[%call_pos] -> ...

in a signature, and

let f = fun ~(loc : [%call_pos]) ... ->
  ...

in an implementation.

The label name doesn’t matter, but there must be a label. The function parameter thus defined has type Lexing.position and, if not specified at a call site, is automatically filled with the position of the call site.

This can serve several purposes:

  • When chaining monadic binds, the usual stack backtraces are very noisy and less useful. You can replace (>>=) : 'a t -> ('a -> 'b t) -> 'b t with (>>=) : loc:[%call_pos] -> 'a t -> ('a -> 'b t) -> 'b t to automatically track binding locations (and then implement a tracing facility).
  • It can be used to improve the usefulness of testing frameworks (see e.g. https://github.com/mirage/alcotest/pull/366) or tracing systems (such as meio[^1]), or error reporting in embedded languages.

History

A first PR[^2] has been proposed ten years ago by @let-def. After some design amendments, that PR was stalled for a while, until renewed interested was expressed in 2023 by @TheLortex. The feature was then discussed during a developer meeting and gathered consensus on the principle[^3]. It was subsequently implemented at Jane Street and has been in use there for several months.

The goal of this RFC is therefore to present the details of the design in use at Jane Street to make sure there is agreement to port the feature upstream. Jane Street has asked Tarides for help in the summarizing work that led to this RFC.

OlivierNicole avatar Feb 21 '25 18:02 OlivierNicole

cc @goldfirere @MisterDA

OlivierNicole avatar Feb 21 '25 18:02 OlivierNicole

This feature has widespread use already within Jane Street. It was originally implemented by @vivianyyd, with much help getting it into production by @Enoumy.

goldfirere avatar Feb 21 '25 18:02 goldfirere

As mentioned previously, there is broad maintainer consensus in favor of the feature. I suppose that the purpose of the RFC is more to discuss the proposed syntax and the practical details.

The proposed syntax looks fine to me.

The -directory bit (and the partial overlap with BUILD_PATH_PREFIX_MAP) stands out as worth investigating: if we implement the feature without this, how does it work in practice in multi-directories projects? Does it work well with Dune? What needs to be changed for it to work better?

I think that we may need a PR that implements the language-level feature to answer these questions, unless you can use the Jane Street compiler with upstream dune to make tests and report on their behavior.

gasche avatar Feb 22 '25 09:02 gasche

The -directory bit (and the partial overlap with BUILD_PATH_PREFIX_MAP) stands out as worth investigating: if we implement the feature without this, how does it work in practice in multi-directories projects?

Note that it's hard to answer these questions without fixing BUILD_PATH_PREFIX_MAP first.

My gut feeling is that with a correctly working BUILD_PATH_PREFIX_MAP, you don't want to add yet another compiler flag to manipulates what gets into __FILE__. The specification of __FILE__ should simply be: the file path as specified on the cli as transformed by the BUILD_PATH_PREFIX_MAP active during the invocation.

dbuenzli avatar Feb 22 '25 10:02 dbuenzli

Does it work well with Dune?

As mentioned, all Dune compiler invocations are done from the workspace root, so compiler paths are paths relative to the workspace root as well, so I think the answer to this question is "yes".

ocamlbuild displays a similar behaviour, except that the paths are relative to the directory from which ocamlbuild is invoked.

nojb avatar Feb 22 '25 13:02 nojb

Thanks! This suggest that we could forget about the -directory part of the RFC, unless I am missing something.

gasche avatar Feb 22 '25 13:02 gasche

Yes, this does suggest we could drop the -directory bit. I'm happy to. The worst case scenario is that we've miscalculated here and that directories won't always show up correctly in some scenarios, in which case we can take another stab at fixing this -- not so bad.

goldfirere avatar Feb 25 '25 01:02 goldfirere

Conversation seems to have died down here, with the following results:

  • There is consensus in favor of adding the feature, with the given semantics (basically, echoing today's optional parameters but with a new behavior in the case that the caller omits the argument).
  • There is consensus around the following bits of syntax:

In types: e.g.

val f : ?loc:[%call_pos] -> unit -> unit

In function definitions: e.g.

let f ?(loc = [%call_pos]) () = ()
(* or *)
let f = fun ?(loc = [%call_pos]) () -> ()

In function call sites: e.g.

let () = f ()   (* omitted *)
let () = f ~loc:some_lexing_position ()   (* supplied *)
let () = let loc = some_lexing_position in f ~(loc : [%call_pos]) ()   (* supplied, with a hint to type inference that [loc] is a call-pos parameter, not a labeled one *)
let () = f ~loc:(some_lexing_position : [%call_pos]) ()  (* more direct syntax for the same thing *)

These last two examples will be very rare in practice, useful only really in higher-order contexts where we can't know the type of f beforehand. Using [%call_pos] as a type in any other place will be an error. (So, for example, let loc : [%call_pos] = some_lexing_position in f ~loc () would be an error.) Support for these last examples was added by demand within Jane Street.

Note also that it is never allowed to pass a call-pos parameter with ?, even though ? is used both in the type and the function definition. (This is not all that different from ordinary optional parameters, which always use ? in types and at definitions, but only rarely at call sites.)

  • There is consensus against adding a -directory flag, as it is not needed with dune.

Does that summarize the state of play well here? Once this gets merged in, we at Jane Street will adjust our use of this feature to match the new syntax.

Thanks!

goldfirere avatar Mar 13 '25 19:03 goldfirere

Does that summarize the state of play well here?

The summary is missing the discussion about:

  1. Positions versus locations.
  2. The actual datatype to use.

Regarding 1. I think this should be implicit source locations which span the application and thus the syntax should rather be [%call_loc].

Regarding 2. I think it would be fine to reuse what has be used for __POS_OF__, which, despite the misleading name represents a location.

dbuenzli avatar Mar 13 '25 22:03 dbuenzli

(This is not all that different from ordinary optional parameters, which always use ? in types and at definitions, but only rarely at call sites.)

I wouldn't say it's rarely used at call sites, you often use ? in call sites to propagate defaults "upwards", e.g.

let primitive ?(limit = 3) x = …
let derived_combinator1 ?limit y = … primitive ?limit …
let derived_combinator2 ?limit y = … primitive ?limit …

dbuenzli avatar Mar 13 '25 22:03 dbuenzli

The summary is missing the discussion about:

  1. Positions versus locations.
  2. The actual datatype to use.

Yes, thanks for reminding me about this! I think that (2) determines (1) -- that is, the actual data captured should dictate the name. After asking around within Jane Street, there is considerable enthusiasm for keeping just the single position, rather than the whole span. We don't need the full span, and the extra information is likely to have a performance impact (this surprised me, too). So the thought here is to keep the existing [%call_pos] feature using Lexing.position, but if there is demand, we could have an additional feature [%call_loc] that also grabs the end position.

It would not be hard to take the existing implementation of [%call_pos] and extend it to also support [%call_loc] -- but that step would actually take a little bit of extra design, given that there's no suitable type in the stdlib to track spans instead of single locations. I propose we separate these ideas and focus only on the [%call_pos] idea here. (I would favor [%call_span] over [%call_loc] for the other feature -- to me, "span" more clearly includes an endpoint. But I don't feel strongly.)

goldfirere avatar Mar 14 '25 20:03 goldfirere

My intuition is that providing a location/span is likely to have various usability benefits in practice for some use-cases. (It's not even obvious what the 'position' of a function call is, I suppose we mean the starting position?) Given that this feature is precisely defined for usability, I would be inclined to prioritize this instead of performance.

I wouldn't believe the performance argument without more explanations. Can you elaborate what the issue would be? Note that Lexing.position is a four-element record that could probably be packed more efficiently if we wanted to consume less space. (It is probably possible to store a location/span in less space than your current proposal uses to represent a single position.)

If performance is expected to be a concern, I would propose using an abstract type for the span/location, with accessors that return various information -- could start as just val start : t -> Lexing.position val end : t -> Lexing.position. This would let us start with whatever representation we like, and optimize it later on without breaking clients.

gasche avatar Mar 14 '25 20:03 gasche

We don't need the full span, and the extra information is likely to have a performance impact (this surprised me, too).

The runtime representation that __POS_OF__ uses for locations is exactly the same as Lexing.position, namely a quadruplet. So memory consumption is not going to change. Now the thing that changes is that if you want to extract the byte range of the span in the sources you need to count the number of newlines in your sources, I doubt that's going to be a performance hog in practice.

Another take would also be to simple have string * int * int a filename and inclusive byte range. Though that would make it harder for clients to directly spit out a line number which would hinder certain applications where you are not interested in processing the sources but only referring to them using OCaml's error message format (e.g. logging).

Personally I'm not super fond of these ad-hoc primitives so I think we should rather aim for having a single one and one that allows to express the functionality of __POS_OF__ that is the location span of a function call in the format the compiler uses to report errors, warnings and stack traces.

As far as naming goes, text locations vs position seems quite established in the compiler and beyond so I think simply having call_loc would be suitable.

but that step would actually take a little bit of extra design, given that there's no suitable type in the stdlib to track spans instead of single locations.

I think I have already mentioned at least two times that this is a red herring, you can simply reuse what __POS_OF__ uses.

dbuenzli avatar Mar 14 '25 21:03 dbuenzli

I'm quite interested in the proposed feature. I currently use __FILE__ and __LINE__ for tracing-related purposes. I think @dbuenzli is right to suggest the reuse of the existing types.

In terms of performance, just to make sure, the use of __POS__ or __POS_OF__ should not allocate at runtime, correct? The compiler should put it into a static variable?

/tmp $ /bin/cat truc.ml 

let f x =
  ignore (Sys.opaque_identity __POS__);
  x
/tmp $ ocamlopt -dclambda truc.ml

clambda:
(seq
  (let
    (f/270
       (closure 
         (fun camlTruc.f_270 1  x/272
           (seq
             (seq
               (opaque "camlTruc.2"=block(0,"camlTruc.1"="truc.ml",3,30,37))
               0)
             x/272)) ))
    (setfield_ptr(root-init) 0 (read_symbol camlTruc) f/270))
  0)
camlTruc.1:
"truc.ml"
camlTruc.2:
block(0,"camlTruc.1"="truc.ml",3,30,37)

-- Simon Cruanes

c-cube avatar Mar 15 '25 01:03 c-cube

On the "performance" angle, there's really two issues:

  1. Binary size: @c-cube is right that these values aren't allocating at runtime, but they still take up space in the binary, which does matter for js_of_ocaml programs, especially when you have thousands to tens-of-thousands of instances like we do in some programs.
  2. Conversion: Though you could always convert a span to a position, that conversion will probably allocate at runtime. Also, this language extension is great for reducing boilerplate for callers, it would be a shame if it introduced more boilerplate for callees in the form of these conversions.

Now, I don't expect that my "Conversion" point will be convincing if you think that spans are always preferable, but Lexing.position has served us quite well, with the [%here] ppx, and now with [%call_pos]. In their widespread use at Jane Street, [%call_pos] is almost always used to provide a better debugging and instrumentation experience for end-users, and the position of the invoked function is both unambiguous and desirable (especially when formatting it in a way that most text-editors have good "jump to file/line/column" support for). Although I'm sure there are examples of places where having the whole span would be better, if we only provided [%call_span], I'd expect that 99% of use-sites would immediately convert it into a Lexing.position, and that would be a shame.

Another take would also be to simple have string * int * int a filename and inclusive byte range. Though that would make it harder for clients to directly spit out a line number which would hinder certain applications where you are not interested in processing the sources but only referring to them using OCaml's error message format (e.g. logging).

As mentioned above, surfacing positions for logging/telemetry/debugging tools is definitely the main use case for this feature.

I think I have already mentioned at least two times that this is a red herring, you can simply reuse what __POS_OF__ uses.

I like Lexing.position. Unlike __POS_OF__ which returns (string * int * int * int), Lexing.position is a record, so it's harder to accidentally misuse or make mistakes constructing one (I'm really not a fan of having three ints floating around and having to remember which one corresponds to what.) Also, but it looks like __POS_OF__ doesn't handle expressions that span multiple lines, which are pretty common for function calls.

fwiw, I'm not suggesting that we ignore spans, I think it'd be nice to have both [%call_pos] and [%call_loc]. This proposal is motivated by improving the usability of the language, and I'd be pretty sad if I had to litter my codebase with conversion functions just because it's conceptually cleaner to only support one construct instead of two.

TyOverby avatar Mar 15 '25 14:03 TyOverby

2. Conversion: Though you could always convert a span to a position, that conversion will probably allocate at runtime.

If all you do is format, unlikely. It will not be different from the transform performed by the code you linked below:

(especially when formatting it in a way that most text-editors have good "jump to file/line/column" support for).

That's not a very good format. Unlike OCaml's error and stacktrace format, it will routinely fail to locate the right bit on sources with Unicode characters. Besides personally, in my program, I'd rather not have a different syntax for stacktraces and instrumentation statements – there's no need to overload my brain with different notations. So I wouldn't take that as an argument in favour of the information given by Lexing.position.

Lexing.position is a record, so it's harder to accidentally misuse or make mistakes constructing one (I'm really not a fan of having three ints floating around and having to remember which one corresponds to what.)

I'm not against adding a type for locations as formatted by OCaml. Also as already mentioned there is one in Printexc.location though for reasons that are unknown to me it has one more integer slot.

Also, but it looks like __POS_OF__ doesn't handle expressions that span multiple lines, which are pretty common for function calls.

It absolutely does handle multiple lines. There seems to be misunderstandings about OCaml's error location and stacktrace reporting format which I explained in the discussion here and which is what __POS_OF__ gives you.

dbuenzli avatar Mar 15 '25 15:03 dbuenzli

If all you do is format, unlikely. It will not be different from the transform performed by the code you linked below:

Just to make that clear with a location __POS_OF__ the snippet code you refered to would just turn into:

(* This is the same function as Ppx_here.lift_position_as_string. *)
let make_location_string (pos_fname, pos_lnum, pos_first, _pos_last) =
  String.concat
    [ pos_fname; ":"; Int.to_string pos_lnum; ":"; Int.to_string pos_first ]
;;

dbuenzli avatar Mar 15 '25 16:03 dbuenzli

Besides personally, in my program, I'd rather not have a different syntax for stacktraces and instrumentation statements – there's no need to overload my brain with different notations.

That's fine; you can always use [%call_loc]!

Also, but it looks like POS_OF doesn't handle expressions that span multiple lines, which are pretty common for function calls. It absolutely does handle multiple lines.

If I wanted to format a span for people to read and understand, then I'd expect (start-line start-col), (end-line, end-col), not whatever "pos_last" is.

let make_location_string (pos_fname, pos_lnum, pos_first, pos_last) =  
  Core.String.concat [ 
    pos_fname; 
    ":"; Int.to_string pos_lnum; 
    ":"; Int.to_string pos_first; 
    ":"; Int.to_string pos_last ]

let positions, _ = __POS_OF__(
  let x = 5 in 
  let y = 10 in 
  x + y)

make_location_string positions
"//toplevel//:1:19:72"

If I'm reading that string, it's not obvious at all what span of code is covered by the "72" there. Maybe a tool could use that in a useful way, but as I've mentioned, the primary usecase here is for people to read.

  1. Conversion: Though you could always convert a span to a position, that conversion will probably allocate at runtime. If all you do is format, unlikely. It will not be different from the transform performed by the code you linked below:

Could you expand on this? It sounds like you're saying that building a Lexing.position record at runtime or doing string concatenation doesn't allocate, which can't actually be your claim.

Unlike OCaml's error and stacktrace format, it will routinely fail to locate the right bit on sources with Unicode characters.

let x = "a" in __POS_OF__ ();
- : (("//toplevel//", 1, 15, 28), ())

let x = "🐫" in __POS_OF__ ();;
- : (("//toplevel//", 1, 18, 31), ())

Looks like __POS_OF__ doesn't do great either in this regard either if you expect it to count visual characters instead of byte offsets. Fwiw, at least vim counts bytes, so I actually prefer this behavior.

Oh also, neither do type error messages:

let x = "a" in __POS_OF__ (1 + "hi");;
Line 1, characters 31-35: Error ...
let x = "🐫" in __POS_OF__ (1 + "hi");;
Line 1, characters 34-38: Error ...

TyOverby avatar Mar 15 '25 16:03 TyOverby

That's fine; you can always use [%call_loc]!

That's not the point, what I'm saying is that the link you gave on how you want to format things for users with a [%call_pos] and Lexing.position is trivially embedded in my proposal with [%call_loc]. It will even save you a subtraction. So I don't see a strong need to add multiple things here.

Could you expand on this? It sounds like you're saying that building a Lexing.position record at runtime or doing string concatenation doesn't allocate, which can't actually be your claim.

What I said, and as shown by my snippet, is that there are going to be no conversion costs between locations as I want them and positions as you want them, as you claimed there would be in your point 2.

Looks like __POS_OF__ doesn't do great either in this regard either if you expect it to count visual characters instead of byte offsets. Fwiw, at least vim counts bytes, so I actually prefer this behavior.

I does if you have a properly configured editor, that understands the semantics. Trying to count visual character in the Unicode world is a doomed endeavour, even counting Unicode scalar values won't do, you need feedback from the rendering layer. Talking in (relative as far as OCaml's format is concerned) bytes with the editor is the only reliable way.

In that respect the GNU standard error format which expects you to count visual columns with wcwidth is broken in the real world of Unicode.

Now the thing with OCaml's error information is that it is in fact a marvel of nice tradeoffs, these 3 integers – which again have the same memory footprint as a Lexing.position and can report the same broken positions – allows you to:

  1. Address a full, precise, multi-line span of Unicode text in a source, for editors that understand the semantics of the format.
  2. Has a graceful degradation for humans if they can't use the machine. They can read the line number and the first column number (exactly like the way you want to format your Lexing.position) which will only be an approximation if your line has Unicode characters (again exactly like a Lexing.position). The last number is a bit more difficult to comprehend humanely but you can still use it if you are in the secrets of the gods.

dbuenzli avatar Mar 15 '25 17:03 dbuenzli

Here is a hopefully fair summary of the debate between “point location” and “span location”. A full location span would be more general than a single position, however @TyOverby expresses concerns about the consequences in terms of binary size and conversion costs.

@dbuenzli makes the point than using the same format as used by __POS_OF__ namely,

  • the filename
  • the line number of the start position, counting from 1
  • the byte offset relative to the beginning of that line
  • the byte offset of the end position relative to the start position

takes up the same size as the currently used Lexing.position and has strictly more expressive power, and the same limitations regarding Unicode. Also it directly contains the three often used values: filename, start line, start “column”.

And indeed with a position encoded in this format, one can do everything that can be done with Lexing.position and more (for example, if the source file contents are available, one can compute the end line and end “column”.

Suggestion: we could make the location type abstract with accessors as suggested by @gasche so that the actual representation is an implementation detail. For starters, we could implement it using the format described above, as it takes up no more space than the current proposal and allows to do the same things without cost, and also to derive a span.

In the future, if needed, we can always switch to a more compact and flexible representation, such as the one used in native debug info (see https://github.com/ocaml/ocaml/pull/10111).

OlivierNicole avatar Mar 17 '25 16:03 OlivierNicole

I'm convinced that we should favor usability over performance here. The performance costs that we are talking about are mostly minor (how large already are the js_of_ocaml outputs that we would be adding tens of thousands of words to?), and they can be further reduced by being clever.

gasche avatar Mar 17 '25 16:03 gasche

Maybe I’m already tired at this time of the afternoon, but it’s not entirely clear from your message what part of my suggestion you are objecting to, and what you propose to replace it with.

OlivierNicole avatar Mar 17 '25 16:03 OlivierNicole

Your summary is fine! I am clarifying my own opinion: after consideration and the further discussion, I am now convinced that we should use locations and not positions.

gasche avatar Mar 17 '25 17:03 gasche

Some further comments.

You suggest to start with the same internal representation as POS_OF, and I think this is very reasonable.

Deciding whether to hide the representation within an abstract type or not is not obvious:

  • As @dbuenzli point out, the current hack of POS_OF has been serving us well, so maybe it would be fine to keep using this. (Maybe it's possible to use labelled tuples here, without breaking POS_OF-using code?)
  • On the other hand, if you want a chance to optimize the representation later on, you need to ask for an abstract type now. If I had JaneStreet-sized binaries I would support this, so my guess would be that you @OlivierNicole should explore this option.

Designing an abstract type for this is not obvious, in particular because for anything non-structural we need this type to be defined in the standard library -- not compiler-libs. Maybe an option would be to introduce a new type in the Lexing module, Lexing.Loc.t or Lexing.Span.t? (This is a hack because Lexing is meant to provide the runtime for ocamllex, not for this, but oh well, aren't those fine names?) For now @dbuenzli is in favor of keeping the trite design that we inherited from POS_OF, but maybe he could be convinced if we do something clearly better; @dbuenzli, you may even have suggestions for an API for such a submodule?

Maybe a dummy proposal to get things going:

module Lexing.Location : sig
  type t
  val file : t -> string

  val start_pos : t -> Lexing.position
  val end_pos : t -> Lexing.position

  val start_line : t -> int
  val end_line : t -> int

  val start_column: t -> int (** TODO document 0- or 1-indexing and that it is a byte offset *)
  val end_column: t -> int

  val start_offset : t -> int (** TODO document that it is a byte offset *)
  val end_offset : t -> int

  val to_POS_tuple : t -> (string * int * int * int) (** See {!__POS_OF__}. *)
end

I am not sure that it is possible to implement end_column without recomputations if we use the POS_OF tuple format, and maybe this would be an argument in favor of not doing that.

Edit: the constructor for this abstract type is missing from this API sketch. There should be one. I suppose that the compiler would be synchronized with the internal implementation, and unsafely inject a structured constant matching this representation.

gasche avatar Mar 17 '25 17:03 gasche

@OlivierNicole it's not clear what you mean by the start position, but this is not what the OCaml error format does:

  • the byte offset of the end position relative to the start position

Both offsets are relative to the start of the line identified by the line number.

and the same limitations regarding Unicode

For humans. But for machines, if the process in charge of highlighting the location information is smart about it, it allows to overcome the problems posed by Unicode (by specifying the location precisely in the text encoding space rather than in the visual space).

Regarding reusing the format of __POS_OF__ that's not something I'm especially attached to. What I'd like is the location (vs. a Lexing.position).

I did however wonder whether keeping the same format would allow to use __POS__ (which is also a location, albeit always occurs on the same line) and __POS_OF__ in interesting way in an explicit specification of a [%call_loc] argument at the call site. But I don't think so (?).

Maybe a dummy proposal to get things going:

@gasche, note that if you want to keep the memory footprint of __POS_OF__/Lexing.position you can't recover most of the information you mentioned there (e.g. Lexing.position values, end_line). Also I would disagree with quite a few of the names there (e.g. avoid the whole column business, that idea is dead).

Regarding a suggestion, in a few projects now I have been using this for text locations which allows to recover all forms of error reporting (OCaml-like or broken GNU multi-lines) but it's a bit more heavy weight (6 integers, though they could be flatten in a single record).

Something I would like to understand is why two more field in Printexc.location were added in 5.2, because without them, it's actually represents __POS_OF__ exactly and these additions look rather ad-hoc to me.

dbuenzli avatar Mar 17 '25 23:03 dbuenzli

I'm not worried about having 6 integers, as one could probably pack them into three Int63 values. (I'm assuming it is reasonable to assume that these line numbers or byte offsets inside one single source file will each fit 31 bits.) If we use an abstract type, it would also be easy for people who maintain a fork of the compiler to choose a different internal representation that preserves less information, and return -1 on some of the accessors.

gasche avatar Mar 18 '25 10:03 gasche

I'm not worried about having 6 integers, as one could probably pack them into three Int63 values.

Another option would be to have a string with all these numbers packed in sequence one after the other using a simple unsigned variable byte encoding with a continuation bit. This is what is done for example in source maps (signed there apparently and wrapped in a base64 layer).

The nice thing about the Textloc.t type I linked to is that you get absolute byte offsets in the file which means you don't have to perform an elaborate dance whenever you want to find the concrete byte range to perform surgery on the sources.

dbuenzli avatar Mar 18 '25 14:03 dbuenzli

And of course once you start packing stuff in variable encodings you should encode differences to make that really tight. So you could encode in order,

first_line_byte
first_line_num
first_byte_diff
last_line_byte_diff
last_line_num_diff
last_byte_diff

and recover the full data with

first_line_byte
first_line_num
first_byte = first_line_byte + first_byte_diff
last_line_byte = first_byte + last_line_byte_pos_diff
last_line_num = first_line_num + last_line_num_diff
last_byte = last_line_byte + last_byte_diff

dbuenzli avatar Mar 18 '25 14:03 dbuenzli

I think the performance worries above about conversions between the __POS_OF__ 4-tuple and Lexing.position are overblown. The two types have the same representation - as far as I know, the only reason they're different is to avoid a tricky dependency between the compiler primitive and the Lexing module.

You can convert between them using Obj.magic if you're feeling adventurous, or if you're using the flambda2 compiler you can write the safe conversion function:

fun (pos_fname, pos_lnum, pos_first, pos_last) -> {pos_fname; pos_lnum; pos_first; pos_last}

and it will be compiled to a no-op.

stedolan avatar Mar 19 '25 11:03 stedolan

The two types have the same representation

Yes this has been mentioned many times. Note however that they do not represent the same thing. You can't recover the semantics of a Lexing.position from the data of a __POS_OF__.

dbuenzli avatar Mar 19 '25 11:03 dbuenzli