fsharp Can't see range text in FCS tests

It's difficult to analyze the tree node sources in the tests:

A possible fix and other options suggestions have been discussed in #13403.

Apr 28 '25 20:04 auduchinok

Let's see if copilot can figure it out

May 20 '25 10:05 vzarytovskii

There are only a few places in the codebase where the source text and associated filename meet. If we identify one that occurs early enough that we can hope it covers most of our use cases, for example here:

https://github.com/dotnet/fsharp/blob/087ffa2d42579baec3b6adcd9b8f568c7dd14ccb/src/Compiler/SyntaxTree/LexHelpers.fs#L116

we can just cache the contents keyed by filename. Make that cache exist only in debug mode and visible from Range.DebugCode.

May 20 '25 14:05 majocha

we can just cache the contents keyed by filename

Would we want to add another layer of complexity? I'd rather want to remove all file system access from the compiler core and have string-based sources only. But that's of course a bigger task.

May 20 '25 15:05 Martin521

we can just cache the contents keyed by filename

Would we want to add another layer of complexity? I'd rather want to remove all file system access from the compiler core and have string-based sources only. But that's of course a bigger task.

Absolutely agreed. But that would be orthogonal to the issue. In fact, this way we could at least remove the current debug mechanism of reading from filesystem in Range.DebugCode.

May 20 '25 15:05 majocha

Just to chime in, we use the File System shim heavily in Rider for the file sources, since we have to use the IntelliJ virtual file system instead of the normal one.

May 20 '25 15:05 auduchinok

I think that fits well.

The good idea really is to eliminate direct FS access from the compiler itself, and only work with ISourceText and other abstraction (some of which might not yet exist). And the abstraction might as well be coming with a default implementation being an FileSystemProvider or whatever.

Is the complexity/benefit ration good here?

May 20 '25 17:05 T-Gro

Yes, I think so. At some level (like fsc and FSharpChecker) of course we need to deal with file system IO. But not deep down in range or CompilerDiagnostics.

I believe we would need a type Source which contains a name (unique in the project, typically the source file path) and the source text (or ISourceText). The FileIndexTable would then map FileIndex and Source (rather than FileIndex and FileName). We then have source access through range and no longer need, deep in the compiler, any re-reads from the file system.

May 20 '25 20:05 Martin521

Since ISourceText represents a point in time it would also need a notion of change, so basically this https://learn.microsoft.com/en-us/dotnet/api/microsoft.codeanalysis.text.sourcetextcontainer?view=roslyn-dotnet-4.13.0 with a optional filename.

May 21 '25 06:05 majocha

Yes, that's a way to do it. But the whole compiler core (parse, type check etc) works on whole files and cannot process local changes. So, I wonder why a simple immutable System.String wouldn't be the right representation of source text. Changes in time are dealt with in higher layers.

May 21 '25 07:05 Martin521

The IDE does not allocate a single large string every time you type a character. But something close to a string (something which can be read sequentially as a stream, and have efficient line+column indexing)

May 21 '25 07:05 T-Gro

Yes, that's a way to do it. But the whole compiler core (parse, type check etc) works on whole files and cannot process local changes. So, I wonder why a simple immutable System.String wouldn't be the right representation of source text. Changes in time are dealt with in higher layers.

I was thinking more in the context of this issue. To properly map a range to a substring via FileIndexTable without touching the filesystem, we need to keep the mapping updated one way or another. Probably, and more correctly, the whole FileIndexTable could be also just a snapshot in time, not a global mutable.

May 21 '25 08:05 majocha

I was thinking more in the context of this issue. To properly map a range to a substring via FileIndexTable without touching the filesystem, we need to keep the mapping updated one way or another. Probably, and more correctly, the whole FileIndexTable could be also just a snapshot in time, not a global mutable.

Ah, that makes sense. I guess it will be difficult to get around the global mutable FileIndexTable, given the current structure of the compiler. But it can be updated in the place you mentioned above.

EDIT: Thinking about it, it may be well possible to pass the table around as a snapshot, similar to what I am planning for #18553. And remove that nasty static global.

May 21 '25 08:05 Martin521

The IDE does not allocate a single large string every time you type a character. But something close to a string (something which can be read sequentially as a stream, and have efficient line+column indexing)

But wouldn't, when calling parseAndCheckFile, a single allocation for the source string be negligible compared to all the allocations that are done during lexing, parsing and type checking?

May 21 '25 08:05 Martin521

For the cost of allocation yes, but not the same for GC - entire source file (imagine typing a few new keystrokes every second) can be large enough to go to the LOH, where it has to live much longer to get collected.

This is why IDE and services around it do not keep a single string, but rather a tree of substrings (so when typing into a single line, only substring for that single line gets reallocated, and rest is reused)

May 21 '25 08:05 T-Gro