lsp icon indicating copy to clipboard operation
lsp copied to clipboard

Optimize `instance Eq NormalizedFilepath`?!

Open sjakobi opened this issue 3 months ago • 8 comments

https://github.com/haskell/lsp/blob/774f0e9de79f6136eedbf96f5589f86f4d3a8cdb/lsp-types/src/Language/LSP/Protocol/Types/Uri.hs#L201-L202

https://github.com/haskell/lsp/blob/774f0e9de79f6136eedbf96f5589f86f4d3a8cdb/lsp-types/src/Language/LSP/Protocol/Types/Uri.hs#L56-L57

I was looking at some -ddump-simpl output from ghcide and noticed that (==) for NormalizedFilePath is somewhat unwieldy. In this case we compare two NormalizedFilePaths named a1 and b1:

case a1 of { NormalizedFilePath ww2 ww3 ww4 ww5 ->
case ww2 of { NormalizedUri ww6 ww7 ->
case ww7 of { Text bx4 bx5 bx6 ->
case b1 of { NormalizedFilePath ww8 ww9 ww10 ww11 ->
case ww8 of { NormalizedUri ww12 ww13 ->
case ww13 of { Text bx3 bx7 bx8 ->
case ==# ww6 ww12 of {
  __DEFAULT -> (# _| #) @ZeroBitRep @LiftedRep @(# #) @Target (##);
  1# ->
    case ==# bx6 bx8 of {
      __DEFAULT -> (# _| #) @ZeroBitRep @LiftedRep @(# #) @Target (##);
      1# ->
        case compareByteArrays# bx4 bx5 bx3 bx7 bx6 of {
          __DEFAULT ->
            (# _| #) @ZeroBitRep @LiftedRep @(# #) @Target (##);
          0# ->
            case ==# ww5 ww11 of {
              __DEFAULT ->
                (# _| #) @ZeroBitRep @LiftedRep @(# #) @Target (##);
              1# ->
                case compareByteArrays# ww3 ww4 ww9 ww10 ww5 of {
                  __DEFAULT ->
                    (# _| #) @ZeroBitRep @LiftedRep @(# #) @Target (##);
                  0# ->
                    (# |_ #) @ZeroBitRep @LiftedRep @(# #) @Target wild2

The steps are:

  1. Compare the hashes stored in the NormalizedUris
  2. Compare the lengths of the Text fields inside the NormalizedUris
  3. Compare the ByteArrays of the same Text fields
  4. Compare the lengths of the unpacked Text fields in the NormalizedFilePaths
  5. Compare the ByteArrays of these Text fields

The first step of comparing the hashes is especially useless in the context of a HashMap or HashSet operation: These operations only check for equality after asserting that the hashes are the same!

It also seems that it should be possible to get away with comparing only one of the Text fields, right? IIUC the NormalizedUri is basically internalNormalizedFilePathToUri nfp, where nfp is the NFP Text field.

So how about this?!

NormalizedFilePath _uri1 nfp1 == NormalizedFilePath _uri2 nfp2 = nfp1 == nfp2

sjakobi avatar Nov 14 '25 02:11 sjakobi

I like the proposal, but I think we should compare uri1 == uri2 since our filepaths often have long common prefixes (since they are absolute filepaths, all of them match for something like the first 20-30 characters), and the hash is a very fast way to check whether the contents can even be equal.

Changing the equality instance should likely also be benchmarked in HLS.

fendor avatar Nov 14 '25 08:11 fendor

I think we should compare uri1 == uri2

Does uri1 == uri2 imply nfp1 == nfp2? If so, it would be good to have this property documented!


I understand that comparing the hashes is a good first step when comparing arbitrary NFPs, but my suspicion is that – at least in HLS – the instance Eq NFP is used mostly for operations on HashMaps and HashSets, where the hash comparison is redundant.

One way to address this would be to use a specialized Eq instance for the HashMap and HashSet operations in HLS, e.g. with something like this:

newtype HashMapNFP = HashMapNFP NormalizedFilePath

instance Eq HashMapNFP where
  (HashMapNFP (NormalizedFilePath _ nfp1)) == (HashMapNFP (NormalizedFilePath _ nfp2)) = nfp1 == nfp2

newtype NFPMap v = NFPMap (HashMap HashMapNFP v)

lookup :: NormalizedFilePath -> NFPMap v -> Maybe v
lookup nfp = HM.lookup (HashMapNFP nfp)
...

But maybe that's overkill.

sjakobi avatar Nov 14 '25 17:11 sjakobi

since our filepaths often have long common prefixes (since they are absolute filepaths, all of them match for something like the first 20-30 characters)

Maybe it would be more efficient to compare them from end towards the beginning then? https://github.com/haskell/text/issues/670

sjakobi avatar Nov 14 '25 19:11 sjakobi

Thank you for thinking about this in such detail! Regarding text, HLS bindists do not ship with simd support due to being hard to ship to users and relying on the necessary system dependencies.

I think we should most simply measure the difference it makes to change the equality instance. Perhaps you can simply change the instance or it doesn't do anything since we usually use HashMap anyway...

Does uri1 == uri2 imply nfp1 == nfp2?

I would have thought so, but I am not sure. Similarly, I am not sure nfp1 == nfp2 implies uri1 == uri2, as I dont know how symbolic links are handled.

fendor avatar Nov 17 '25 10:11 fendor

Regarding text, HLS bindists do not ship with simd support due to being hard to ship to users and relying on the necessary system dependencies.

I think you are mixing text +simdutf (which is a C++ library for SIMD-powered validation of UTF-8) and just general SIMD enabled. HLS disables the former but most likely not the latter (how could it?), and, say, memcmp uses a system-provided libc, which is most likely SIMD-powered.

Bodigrim avatar Nov 17 '25 21:11 Bodigrim

Indeed, I am! Thank you very much for the clarification.

fendor avatar Nov 18 '25 09:11 fendor

I've made a PR: https://github.com/haskell/lsp/pull/630

To benchmark this in HLS, should I just follow this guide? https://haskell-language-server.readthedocs.io/en/latest/contributing/contributing.html#benchmarks

sjakobi avatar Nov 19 '25 10:11 sjakobi

yes 🙈 I actually have no idea though, how up-to-date this is

fendor avatar Nov 19 '25 12:11 fendor