helix feat: Find diagnostics for both `C:` and `c:` forms of a path

Document::url() uses Document::path() which is just a reference to Document.path. Since LSPs can send either a path with an uppercase or lowercase drive letter, we need to check for the other option when the first fails.

We do this throughout Helix by introducing NormalizedUrl which takes care of the conversion and delegates to its base Url when needed. If future conversions are needed, they can be added here instead of having to repeat them all over the place.

Closes #3267

Aug 06 '22 23:08 poliorcetics

I checked all uses of Document::url(), the main other case is for symbols, and I don't think it will be affected then ? Or at least, it won't lose symbols, only show a path when it could avoid it (because it's for the current file), but I'm not certain.

Aug 06 '22 23:08 poliorcetics

I think this is a more general problem than just this instance. Anytime URLs are compared you risk issues, for example in diag_picker the current_path is compared again with the URLs from the LSP server.

My idea would be to create a newtype NormalizedUrl (naming to be determined) that wraps the URL and does the normalisation. This type will uphold the invariant that the drive letter is upper case, but will otherwise be a pretty thin wrapper around Url. A bonus is that if any new normalisations are discovered they can done here as well.

We would then replace all usages of Url in helix with NormalizedUrl. Code using for example Eq will then just work basically unmodified.

Aug 07 '22 13:08 Frojdholm

@poliorcetics This is a draft of what I've been working on. It's quite clunky since URLs are often used to interface with other types in the lsp_types crate. The normalizing function is also a bit of a hack at the moment, but it works on my machine. Either way we will need a newtype if we want to store the URL in a hash map since the hash and eq methods are used for finding the keys.

https://github.com/helix-editor/helix/compare/23f4a40a7a037948935aea7f24e20319029d174e...Frojdholm:helix:draft-3267-normalized-url

Aug 09 '22 16:08 Frojdholm

@Frojdholm I had already started on your suggestion on my side, I looked at yours in diagonal, it seems like we got similar solutions, except for the C:/c: check.

Edit: thanks anyway for the work and the suggestion, both are appreciated 👍

Aug 09 '22 22:08 poliorcetics

I have not yet pushed my changes, I was benchmarking both approach (normalizing at init and storing the result vs normalizing during comparison).

The worst case in both approach is a Windows path URL of the form file://C:/my/neat/path or file:///<same here>, because this is the form that has to be normalized for later comparison. All other forms of comparison, notably those with different schemes for the URLs, are very fast, less than a hundred (100) nanoseconds at worst, in picoseconds for the best case on my machine (when the first letter of the scheme is different, e.g, https and file).

So, worst case: file://C:/my/neat/path. This is still very fast, despite needing to .clone() for the benchmarks because Urls are not Copy.

When normalizing during construction, cloning+normalizing (which needs to generate a new URL with c:, different from the base one), takes ~315ns.
When normalizing during comparison, the same cloning and normalizing takes ~350ns, and that's because my test urls are small, if I add a few levels of depth to my test paths, it gets worse (still in the hundreds of nanoseconds, but worse all the same).

I expect performances to be worse on Windows because the allocator is (reputedly) not as good and most machines are not a Macbook Pro with the biggest M1 Pro, but even at twice the time, this is still good.

I'll continue on the road of normalizing at construction because overall all the comparisons are faster with it and the code is clearer IMO (e.g the comparison function for the normalize-during-compare case needs a little help to be faster, and it still does not reach the pre-normalized case by ~10%). On a macOS install, with local paths (so no drive letter C:), constructing, even with normalizing takes about 40ns and so shouldn't affect helix's performance at all, especially compared to reading the file itself.

Of course my benchmarks are probably not that representative, but they at least show both approach are sufficiently fast and the first one will do much less small and repeated allocations.

Aug 14 '22 14:08 poliorcetics

Rebased on recent master, it was conflicting

Sep 04 '22 15:09 poliorcetics

\cc @pascalkuthe

Sorry for the delay on this PR, I've been hoping to find a way to do this without an additional wrapper type.

Feb 17 '23 06:02 archseer

The official URI reference implementation for the LSP spec is aware of this problem and even has tests for it. They seem to simply provide an option to normalize the drive letter when converting a URI to a path.

More importantly when an URI that is encoded to a string (which is done when it's sent server -> client/client -> serve) the driver litter is always converted to lower case:

https://github.com/microsoft/vscode-uri/blob/1af8b2576342c7e6deff4d4a2c2e6cdfda170ff2/src/uri.ts#L637

Sadly the rust URL crate exposed by LSP types doesn't seem to perform any kind of normalization like this at any point. That is probably because this specific property is specific to LSP URIs but lsp_types just uses the URL crate regardless. There seem to be mysterious historic reasons for this: https://github.com/microsoft/vscode/issues/42159#issuecomment-360791384

Given that the driver letters sent by any LSP conformant server (or at-least those that work with VSCode?) should always be lowercase it should hopefully be enough for us to be careful to always lowercase the driver letters of any paths before turning them into an URL (probably enough to have a utility function for that in helix-core.

I sincerely hope that all LS actually stick to this send everything lowercase rule (but at-least all LS that are written in nodejs should as they would be using the official LSP uri library). Otherwise we need to somehow intercept messages somewhere early in the LSP client and lowercase the driver letter.

Regardless I am not a fan of introducing a new normalized path type that stores two versions of the URI. We should just normalize all our URIs to be lower-cased to match VSCode and the spec here

Feb 18 '23 01:02 pascalkuthe

I sincerely hope that all LS actually stick to this send everything lowercase rule (but at-least all LS that are written in nodejs should as they would be using the official LSP uri library). Otherwise we need to somehow intercept messages somewhere early in the LSP client and lowercase the driver letter.

Yeah that won't happen. Relying on that when the rest of the computing world uses uppercase letters was a mistake from the LSP spec IMO. We can avoid the problem if we lowercase drive letters everywhere but we absolutely shouldn't rely on LSPs doing the right thing

Feb 21 '23 15:02 poliorcetics

Actually an even easier solution: Do we really need to store an URI instead of just a Path? Rust PathBuf and Path correctly handles this case. Is there any reason to store an URL over just converting everything to PathBuf (we still need to normalize before sending ofcourse)?

Feb 22 '23 13:02 pascalkuthe

Does helix intend to support remote editing one day (like hx scp://user@remote[:port]//path/to/file.txt) ? If so, using PathBuf would be detrimental

Feb 22 '23 15:02 poliorcetics

Does helix intend to support remote editing one day (like hx scp://user@remote[:port]//path/to/file.txt) ? If so, using PathBuf would be detrimental

I think @archseer sentiment on the topic (https://github.com/helix-editor/helix/issues/3721#issuecomment-1237605939) was mostly to use sshfs or similar so I am inclined to say that is a problem we can putoff until helix actually gains that capability

Feb 22 '23 15:02 pascalkuthe

Does helix intend to support remote editing one day (like hx scp://user@remote[:port]//path/to/file.txt) ? If so, using PathBuf would be detrimental

I think @archseer sentiment on the topic (https://github.com/helix-editor/helix/issues/3721#issuecomment-1237605939) was mostly to use sshfs or similar so I am inclined to say that is a problem we can putoff until helix actually gains that capability

The arguments on the issue are pretty much what I would have said about sshfs, the idea of it is nice, in practice it's not that useful because of its inherent limitations.

Lowercasing drive letters all the time seems more forward-compatible than using PathBuf here, I can change NormalizedUrl to do that instead (and so not allocate twice)

Feb 22 '23 15:02 poliorcetics

Does helix intend to support remote editing one day (like hx scp://user@remote[:port]//path/to/file.txt) ? If so, using PathBuf would be detrimental

I think @archseer sentiment on the topic (#3721 (comment)) was mostly to use sshfs or similar so I am inclined to say that is a problem we can putoff until helix actually gains that capability

The arguments on the issue are pretty much what I would have said about sshfs, the idea of it is nice, in practice it's not that useful because of its inherent limitations.

Lowercasing drive letters all the time seems more forward-compatible than using PathBuf here, I can change NormalizedUrl to do that instead (and so not allocate twice)

I am not convinced that remote editing belongs in the core editor (that issue is labeled with A-plugin) and even if it does supporting that will require much larger changes to the way we represent buffer paths anyway. I don't think it makes sense to introduce a bunch of code for a new wrapper type (https://github.com/helix-editor/helix/pull/3347#issuecomment-1434155542) for a hypothetical future feature right now.

The future proofing argument doesn't really make sense here as helix fundamentally only deals in Paths anyway and it's (in my opinion) a bug rather than a feature that we compared URIs at all. If you look at how we actually use URIs in the codebase there are currently exactly three operations that we perform on URIs:

send them to the LS server, URIs are always generated from Document::path (which is a PathBuf). This usecase is what Docuemnt::uri() was intended for.
Convert a URI from the server to a Path with to_file_path for local use.
Comparison with another URI. However that other URI is always generated from Document::path and hence a PathBuf/file:// URI

All usecases only work with file::// URIs anyway so we might aswell switch all occurrences of the third case to the second case to be consistent and to fix problems like https://github.com/helix-editor/helix/issues/3267 (in general I think a simple URL comparison will fail). I think there will actually be more edgecases like this

Feb 22 '23 16:02 pascalkuthe

helix helix copied to clipboard

feat: Find diagnostics for both `C:` and `c:` forms of a path

helix
helix copied to clipboard