grace icon indicating copy to clipboard operation
grace copied to clipboard

Ability to use cartesian positions directly

Open giltho opened this issue 7 months ago • 6 comments

Hello,

Unless I cannot find the correct API, it looks like it is only possible to create diagnostics using byte indexes within the source. What I currently have access to is only cartesian positions, so I have to open the file and transform that line-offset into byte index. Then, the renderer re-opens the file to do the printing, and to transform it back to cartesian coordinates.

Would it be possible add an API to create Ranges from cartesian coordinates? I'm happy to do a PR if you are ok with this, it will take time because I have very limited time right now.

giltho avatar Jun 05 '25 12:06 giltho

Hey 👋

Unless I cannot find the correct API, it looks like it is only possible to create diagnostics using byte indexes within the source.

Yes, only byte-oriented positions are supported at the moment.

What I currently have access to is only cartesian positions

What parsing library are you using? Both Menhir and Sedlex will be able to give you byte-oriented positions

so I have to open the file and transform that line-offset into byte index. Then, the renderer re-opens the file to do the printing, and to transform it back to cartesian coordinates.

If its any better, I could easily make a patch to avoid re-reading the file for line offsets by allowing the pp_diagnostic functions to take a custom line_starts_fn when configuring the Source_reader, see here for more details.

I'm happy to do a PR if you are ok with this, it will take time because I have very limited time right now.

Contributions are definitely welcome! Though it might be easier to try recover the original byte offsets you might've had.

johnyob avatar Jun 05 '25 18:06 johnyob

What parsing library are you using? Both Menhir and Sedlex will be able to give you byte-oriented positions

I did some digging -- I assume this issue is related to your contribution on https://github.com/soteria-tools/soteria/pull/69.

Looking at it, it seems you're using the Cerberus parser (which uses Menhir under the hood). The function you'll want to convert Cerb_position.t to Grace's Byte_index.t is:

let cerb_pos_to_idx cpos = Byte_index.of_lex (Cerb_position.to_file_lexing cpos)

johnyob avatar Jun 05 '25 19:06 johnyob

Oh thank you for looking into this!! I appreciate your time :)

Looking at it, it seems you're using the Cerberus parser (which uses Menhir under the hood). The function you'll want to convert Cerb_position.t to Grace's Byte_index.t is:

I did try that before writing my byte-index retrieval, but unfortunately Cerberus doesn't parse the C source file, it parses the file that the C preprocessor outputs. If one tries to us Byte_index.of_lex, it just yields an out-of-bound exception.

E.g. take the C file

#include <stdlib.h>

int main() { return *NULL }

The preprocessor will add the entirety of stdlib.h (several hundred lines) before that, and the byte_index of *NULL will be a few thousands.

Cerberus keeps track of the correct line of code before pre-processing, though it might keep track of the wrong column number if there is a preprocessing happening within a line but heh 🤷 Anyway, one cannot trust the byte-offset given by Cerberus's lexer :(

giltho avatar Jun 05 '25 20:06 giltho

If its any better, I could easily make a patch to avoid re-reading the file for line offsets by allowing the pp_diagnostic functions to take a custom line_starts_fn when configuring the Source_reader, see here for more details.

Definitely! That being said, printing error messages is currently not a bottleneck of our tool at all, so no pressure on our end to investigate.

giltho avatar Jun 05 '25 21:06 giltho

I was thinking about this, what if we parameterized the range type in diagnostics e.g.

module type Range = sig 
  type t 
  include Pretty_printer.S with type t := t 
  
  (** Able to use [Source_reader] in here. *)
  val to_range : t -> Range.t 
end

This would allow you to use a custom range representation and only need the translation when printing?

johnyob avatar Aug 22 '25 11:08 johnyob

Very interesting you should say that! 2 days ago I made the Range notion in our tool a parameter type!

At first, I tried using Grace's notion of range, but I quickly realised that no notion of range is perfect, and all tools will have their own kind of range. Anyway, I fully support this :)

giltho avatar Aug 22 '25 13:08 giltho