helix
helix copied to clipboard
feat: Find diagnostics for both `C:` and `c:` forms of a path
Document::url()
uses Document::path()
which is just a reference to Document.path
. Since LSPs
can send either a path with an uppercase or lowercase drive letter, we need to check for the other
option when the first fails.
We do this throughout Helix by introducing NormalizedUrl
which takes care of the conversion and
delegates to its base Url
when needed. If future conversions are needed, they can be added here
instead of having to repeat them all over the place.
Closes #3267
I checked all uses of Document::url()
, the main other case is for symbols, and I don't think it will be affected then ? Or at least, it won't lose symbols, only show a path when it could avoid it (because it's for the current file), but I'm not certain.
I think this is a more general problem than just this instance. Anytime URLs are compared you risk issues, for example in diag_picker
the current_path
is compared again with the URLs from the LSP server.
My idea would be to create a newtype NormalizedUrl
(naming to be determined) that wraps the URL and does the normalisation. This type will uphold the invariant that the drive letter is upper case, but will otherwise be a pretty thin wrapper around Url
. A bonus is that if any new normalisations are discovered they can done here as well.
We would then replace all usages of Url
in helix with NormalizedUrl
. Code using for example Eq
will then just work basically unmodified.
@poliorcetics This is a draft of what I've been working on. It's quite clunky since URLs are often used to interface with other types in the lsp_types
crate. The normalizing function is also a bit of a hack at the moment, but it works on my machine. Either way we will need a newtype if we want to store the URL in a hash map since the hash and eq methods are used for finding the keys.
https://github.com/helix-editor/helix/compare/23f4a40a7a037948935aea7f24e20319029d174e...Frojdholm:helix:draft-3267-normalized-url
@Frojdholm I had already started on your suggestion on my side, I looked at yours in diagonal, it seems like we got similar solutions, except for the C:
/c:
check.
Edit: thanks anyway for the work and the suggestion, both are appreciated 👍
I have not yet pushed my changes, I was benchmarking both approach (normalizing at init and storing the result vs normalizing during comparison).
The worst case in both approach is a Windows path URL of the form file://C:/my/neat/path
or file:///<same here>
, because this is the form that has to be normalized for later comparison. All other forms of comparison, notably those with different schemes for the URLs, are very fast, less than a hundred (100) nanoseconds at worst, in picoseconds for the best case on my machine (when the first letter of the scheme is different, e.g, https
and file
).
So, worst case: file://C:/my/neat/path
. This is still very fast, despite needing to .clone()
for the benchmarks because Url
s are not Copy
.
- When normalizing during construction, cloning+normalizing (which needs to generate a new URL with
c:
, different from the base one), takes ~315ns. - When normalizing during comparison, the same cloning and normalizing takes ~350ns, and that's because my test urls are small, if I add a few levels of depth to my test paths, it gets worse (still in the hundreds of nanoseconds, but worse all the same).
I expect performances to be worse on Windows because the allocator is (reputedly) not as good and most machines are not a Macbook Pro with the biggest M1 Pro, but even at twice the time, this is still good.
I'll continue on the road of normalizing at construction because overall all the comparisons are faster with it and the code is clearer IMO (e.g the comparison function for the normalize-during-compare case needs a little help to be faster, and it still does not reach the pre-normalized case by ~10%). On a macOS install, with local paths (so no drive letter C:
), constructing, even with normalizing takes about 40ns and so shouldn't affect helix's performance at all, especially compared to reading the file itself.
Of course my benchmarks are probably not that representative, but they at least show both approach are sufficiently fast and the first one will do much less small and repeated allocations.
Rebased on recent master, it was conflicting
\cc @pascalkuthe
Sorry for the delay on this PR, I've been hoping to find a way to do this without an additional wrapper type.
The official URI reference implementation for the LSP spec is aware of this problem and even has tests for it.
They seem to simply provide an option to normalize the drive letter when converting a URI
to a path.
More importantly when an URI that is encoded to a string (which is done when it's sent server -> client/client -> serve) the driver litter is always converted to lower case:
https://github.com/microsoft/vscode-uri/blob/1af8b2576342c7e6deff4d4a2c2e6cdfda170ff2/src/uri.ts#L637
Sadly the rust URL crate exposed by LSP types doesn't seem to perform any kind of normalization like this at any point.
That is probably because this specific property is specific to LSP URIs but lsp_types
just uses the URL
crate regardless.
There seem to be mysterious historic reasons for this: https://github.com/microsoft/vscode/issues/42159#issuecomment-360791384
Given that the driver letters sent by any LSP conformant server (or at-least those that work with VSCode?) should always be lowercase it should hopefully be enough for us to be careful to always lowercase the driver letters of any paths before turning them into an URL
(probably enough to have a utility function for that in helix-core
.
I sincerely hope that all LS actually stick to this send everything lowercase rule (but at-least all LS that are written in nodejs should as they would be using the official LSP uri library). Otherwise we need to somehow intercept messages somewhere early in the LSP client and lowercase the driver letter.
Regardless I am not a fan of introducing a new normalized path type that stores two versions of the URI. We should just normalize all our URIs to be lower-cased to match VSCode and the spec here
I sincerely hope that all LS actually stick to this send everything lowercase rule (but at-least all LS that are written in nodejs should as they would be using the official LSP uri library). Otherwise we need to somehow intercept messages somewhere early in the LSP client and lowercase the driver letter.
Yeah that won't happen. Relying on that when the rest of the computing world uses uppercase letters was a mistake from the LSP spec IMO. We can avoid the problem if we lowercase drive letters everywhere but we absolutely shouldn't rely on LSPs doing the right thing
Actually an even easier solution: Do we really need to store an URI instead of just a Path
? Rust PathBuf
and Path
correctly handles this case. Is there any reason to store an URL
over just converting everything to PathBuf
(we still need to normalize before sending ofcourse)?
Does helix intend to support remote editing one day (like hx scp://user@remote[:port]//path/to/file.txt
) ? If so, using PathBuf
would be detrimental
Does helix intend to support remote editing one day (like
hx scp://user@remote[:port]//path/to/file.txt
) ? If so, usingPathBuf
would be detrimental
I think @archseer sentiment on the topic (https://github.com/helix-editor/helix/issues/3721#issuecomment-1237605939) was mostly to use sshfs
or similar so I am inclined to say that is a problem we can putoff until helix actually gains that capability
Does helix intend to support remote editing one day (like
hx scp://user@remote[:port]//path/to/file.txt
) ? If so, usingPathBuf
would be detrimentalI think @archseer sentiment on the topic (https://github.com/helix-editor/helix/issues/3721#issuecomment-1237605939) was mostly to use
sshfs
or similar so I am inclined to say that is a problem we can putoff until helix actually gains that capability
The arguments on the issue are pretty much what I would have said about sshfs, the idea of it is nice, in practice it's not that useful because of its inherent limitations.
Lowercasing drive letters all the time seems more forward-compatible than using PathBuf
here, I can change NormalizedUrl
to do that instead (and so not allocate twice)
Does helix intend to support remote editing one day (like
hx scp://user@remote[:port]//path/to/file.txt
) ? If so, usingPathBuf
would be detrimentalI think @archseer sentiment on the topic (#3721 (comment)) was mostly to use
sshfs
or similar so I am inclined to say that is a problem we can putoff until helix actually gains that capabilityThe arguments on the issue are pretty much what I would have said about sshfs, the idea of it is nice, in practice it's not that useful because of its inherent limitations.
Lowercasing drive letters all the time seems more forward-compatible than using
PathBuf
here, I can changeNormalizedUrl
to do that instead (and so not allocate twice)
I am not convinced that remote editing belongs in the core editor (that issue is labeled with A-plugin
) and even if it does supporting that will require much larger changes to the way we represent buffer paths anyway. I don't think it makes sense to introduce a bunch of code for a new wrapper type (https://github.com/helix-editor/helix/pull/3347#issuecomment-1434155542) for a hypothetical future feature right now.
The future proofing argument doesn't really make sense here as helix fundamentally only deals in Path
s anyway and it's (in my opinion) a bug rather than a feature that we compared URIs at all. If you look at how we actually use URIs
in the codebase there are currently exactly three operations that we perform on URIs:
- send them to the LS server,
URI
s are always generated fromDocument::path
(which is aPathBuf
). This usecase is whatDocuemnt::uri()
was intended for. - Convert a
URI
from the server to aPath
withto_file_path
for local use. - Comparison with another
URI
. However that otherURI
is always generated fromDocument::path
and hence aPathBuf
/file://
URI
All usecases only work with file:://
URIs anyway so we might aswell switch all occurrences of the third case to the second case to be consistent and to fix problems like https://github.com/helix-editor/helix/issues/3267 (in general I think a simple URL comparison will fail). I think there will actually be more edgecases like this