codemap
codemap copied to clipboard
CodeMap::remove_file
Useful for long running processes with state (think REPL).
Current implementation is linear, can be optimized later to be
O(1).
So far the goal has been to make it hard to obtain invalid spans, while going this direction probably means that more functions should return Option. What happens if a line in your REPL defines a function that is later involved in an error and needs a span lookup?
I worry about adding methods for operations the datastructure isn't really designed for. The "index into concatenated files" technique that CodeMap uses is a better fit for batch compilers. For things like language servers that may process more code than a REPL, the 32-bit position space could easily run out. I believe rustc incremental compilation used in RLS stores uncompressed spans across requests, and uses a fresh CodeMap only within the code that updates the index for a crate.
What happens if a line in your REPL defines a function that is later involved in an error and needs a span lookup?
A function may have a reference to a struct which references File and CodeMap, and remove that File when last reference to File is dropped.
32-bit position space could easily run out
Yep, I think it should be 64-bit.
fresh CodeMap
I was thinking about having own struct like
struct SpanWithCodeMap { span, Rc<CodeMap> }
But unfortunately codemap-diagnostic crate heavily depends on Span without context (e. g. struct Diagnostic or emitter which requires all spans to belong to single code map), so it's hard to use that struct without rewriting copy-pasting a lot codemap-diagnostic.
That's why I came up with this solution.
Here: https://github.com/kevinmehall/codemap-diagnostic/blob/c0eaceccd13fe05dee1903a83e1421aafd9584bb/src/lib.rs#L62 Diagnostic requires all span to belong the same CodeMap.
Yeah, I think you're running into some of the same limitations that I have. The "index into concatenated files" thing from rustc seems clever but is probably premature optimization for many cases.
One change I've been thinking about is to make Arc<File> more like a single-file CodeMap that could be used standalone or added to multiple different CodeMaps. For something like an AST, where everything comes from a single source file, it would work to store the Arc<File> next to the AST root (or next to each function AST, or whatever), and have the AST contain spans relative to that file. That would make it easier to create, destroy, and replace ASTs without mutating a global CodeMap. When lowering the AST into something that contains spans from multiple files, the files could be added to a CodeMap and the spans translated. In the language I'm working on, that CodeMap would be short-lived and more like a batch process, but I'm not sure that's universal.
I think the codemap-diagnostic API should replace its use of CodeMap with a trait that could be implemented for CodeMap, File, or external custom types. I was thinking that Span should be an associated type of that trait, but you bring up a good point that it may be limiting to require all spans within the same diagnostic message to come from the same CodeMap. One complication there is it must be able to relate two spans to each other if they come from the same file, because it can display multiple annotations in one snippet.
OK, there is a couple of ways to fix this issue. Do you know how to fix it without breaking backward compatibility? (Codemap-diagnostic follows semver conventions I presume.)
I'm currently working on a project that codemap-diagnostic would be very useful for, however, I've also found that many of the design decisions made in codemap are not ideal for my particular project or coding style. And, frankly, there will never be a one-size-fits-all answer. Just, perhaps, "the simplest one that works" as a default.
A lot of these trade-offs were made intentionally for one reason or another in rustc, and have changed over times. There's a lot of projects that would want the ready-made (and great) compiler error emitter, but may want to make very different tradeoffs in the code map/span encoding.
I'd be willing to work on making codemap-diagnostic agnostic to the underlying code map implementation to help with such issues.
Let's move the codemap-diagnostic side of this over to an issue on that repo: https://github.com/kevinmehall/codemap-diagnostic/issues/7