Can't see range text in FCS tests
It's difficult to analyze the tree node sources in the tests:
A possible fix and other options suggestions have been discussed in #13403.
Let's see if copilot can figure it out
There are only a few places in the codebase where the source text and associated filename meet. If we identify one that occurs early enough that we can hope it covers most of our use cases, for example here:
https://github.com/dotnet/fsharp/blob/087ffa2d42579baec3b6adcd9b8f568c7dd14ccb/src/Compiler/SyntaxTree/LexHelpers.fs#L116
we can just cache the contents keyed by filename. Make that cache exist only in debug mode and visible from Range.DebugCode.
we can just cache the contents keyed by filename
Would we want to add another layer of complexity? I'd rather want to remove all file system access from the compiler core and have string-based sources only. But that's of course a bigger task.
we can just cache the contents keyed by filename
Would we want to add another layer of complexity? I'd rather want to remove all file system access from the compiler core and have string-based sources only. But that's of course a bigger task.
Absolutely agreed. But that would be orthogonal to the issue. In fact, this way we could at least remove the current debug mechanism of reading from filesystem in Range.DebugCode.
Just to chime in, we use the File System shim heavily in Rider for the file sources, since we have to use the IntelliJ virtual file system instead of the normal one.
I think that fits well.
The good idea really is to eliminate direct FS access from the compiler itself, and only work with ISourceText and other abstraction (some of which might not yet exist). And the abstraction might as well be coming with a default implementation being an FileSystemProvider or whatever.
Is the complexity/benefit ration good here?
Yes, I think so. At some level (like fsc and FSharpChecker) of course we need to deal with file system IO. But not deep down in range or CompilerDiagnostics.
I believe we would need a type Source which contains a name (unique in the project, typically the source file path) and the source text (or ISourceText). The FileIndexTable would then map FileIndex and Source (rather than FileIndex and FileName). We then have source access through range and no longer need, deep in the compiler, any re-reads from the file system.
Since ISourceText represents a point in time it would also need a notion of change, so basically this https://learn.microsoft.com/en-us/dotnet/api/microsoft.codeanalysis.text.sourcetextcontainer?view=roslyn-dotnet-4.13.0 with a optional filename.
Yes, that's a way to do it.
But the whole compiler core (parse, type check etc) works on whole files and cannot process local changes. So, I wonder why a simple immutable System.String wouldn't be the right representation of source text. Changes in time are dealt with in higher layers.
The IDE does not allocate a single large string every time you type a character. But something close to a string (something which can be read sequentially as a stream, and have efficient line+column indexing)
Yes, that's a way to do it. But the whole compiler core (parse, type check etc) works on whole files and cannot process local changes. So, I wonder why a simple immutable
System.Stringwouldn't be the right representation of source text. Changes in time are dealt with in higher layers.
I was thinking more in the context of this issue. To properly map a range to a substring via FileIndexTable without touching the filesystem, we need to keep the mapping updated one way or another. Probably, and more correctly, the whole FileIndexTable could be also just a snapshot in time, not a global mutable.
I was thinking more in the context of this issue. To properly map a range to a substring via
FileIndexTablewithout touching the filesystem, we need to keep the mapping updated one way or another. Probably, and more correctly, the wholeFileIndexTablecould be also just a snapshot in time, not a global mutable.
Ah, that makes sense. I guess it will be difficult to get around the global mutable FileIndexTable, given the current structure of the compiler. But it can be updated in the place you mentioned above.
EDIT: Thinking about it, it may be well possible to pass the table around as a snapshot, similar to what I am planning for #18553. And remove that nasty static global.
The IDE does not allocate a single large string every time you type a character. But something close to a string (something which can be read sequentially as a stream, and have efficient line+column indexing)
But wouldn't, when calling parseAndCheckFile, a single allocation for the source string be negligible compared to all the allocations that are done during lexing, parsing and type checking?
For the cost of allocation yes, but not the same for GC - entire source file (imagine typing a few new keystrokes every second) can be large enough to go to the LOH, where it has to live much longer to get collected.
This is why IDE and services around it do not keep a single string, but rather a tree of substrings (so when typing into a single line, only substring for that single line gets reallocated, and rest is reused)