Optimize `instance Eq NormalizedFilepath`?!
https://github.com/haskell/lsp/blob/774f0e9de79f6136eedbf96f5589f86f4d3a8cdb/lsp-types/src/Language/LSP/Protocol/Types/Uri.hs#L201-L202
https://github.com/haskell/lsp/blob/774f0e9de79f6136eedbf96f5589f86f4d3a8cdb/lsp-types/src/Language/LSP/Protocol/Types/Uri.hs#L56-L57
I was looking at some -ddump-simpl output from ghcide and noticed that (==) for NormalizedFilePath is somewhat unwieldy. In this case we compare two NormalizedFilePaths named a1 and b1:
case a1 of { NormalizedFilePath ww2 ww3 ww4 ww5 ->
case ww2 of { NormalizedUri ww6 ww7 ->
case ww7 of { Text bx4 bx5 bx6 ->
case b1 of { NormalizedFilePath ww8 ww9 ww10 ww11 ->
case ww8 of { NormalizedUri ww12 ww13 ->
case ww13 of { Text bx3 bx7 bx8 ->
case ==# ww6 ww12 of {
__DEFAULT -> (# _| #) @ZeroBitRep @LiftedRep @(# #) @Target (##);
1# ->
case ==# bx6 bx8 of {
__DEFAULT -> (# _| #) @ZeroBitRep @LiftedRep @(# #) @Target (##);
1# ->
case compareByteArrays# bx4 bx5 bx3 bx7 bx6 of {
__DEFAULT ->
(# _| #) @ZeroBitRep @LiftedRep @(# #) @Target (##);
0# ->
case ==# ww5 ww11 of {
__DEFAULT ->
(# _| #) @ZeroBitRep @LiftedRep @(# #) @Target (##);
1# ->
case compareByteArrays# ww3 ww4 ww9 ww10 ww5 of {
__DEFAULT ->
(# _| #) @ZeroBitRep @LiftedRep @(# #) @Target (##);
0# ->
(# |_ #) @ZeroBitRep @LiftedRep @(# #) @Target wild2
The steps are:
- Compare the hashes stored in the
NormalizedUris - Compare the lengths of the
Textfields inside theNormalizedUris - Compare the
ByteArrays of the sameTextfields - Compare the lengths of the unpacked
Textfields in theNormalizedFilePaths - Compare the
ByteArrays of theseTextfields
The first step of comparing the hashes is especially useless in the context of a HashMap or HashSet operation: These operations only check for equality after asserting that the hashes are the same!
It also seems that it should be possible to get away with comparing only one of the Text fields, right? IIUC the NormalizedUri is basically internalNormalizedFilePathToUri nfp, where nfp is the NFP Text field.
So how about this?!
NormalizedFilePath _uri1 nfp1 == NormalizedFilePath _uri2 nfp2 = nfp1 == nfp2
I like the proposal, but I think we should compare uri1 == uri2 since our filepaths often have long common prefixes (since they are absolute filepaths, all of them match for something like the first 20-30 characters), and the hash is a very fast way to check whether the contents can even be equal.
Changing the equality instance should likely also be benchmarked in HLS.
I think we should compare
uri1 == uri2
Does uri1 == uri2 imply nfp1 == nfp2? If so, it would be good to have this property documented!
I understand that comparing the hashes is a good first step when comparing arbitrary NFPs, but my suspicion is that – at least in HLS – the instance Eq NFP is used mostly for operations on HashMaps and HashSets, where the hash comparison is redundant.
One way to address this would be to use a specialized Eq instance for the HashMap and HashSet operations in HLS, e.g. with something like this:
newtype HashMapNFP = HashMapNFP NormalizedFilePath
instance Eq HashMapNFP where
(HashMapNFP (NormalizedFilePath _ nfp1)) == (HashMapNFP (NormalizedFilePath _ nfp2)) = nfp1 == nfp2
newtype NFPMap v = NFPMap (HashMap HashMapNFP v)
lookup :: NormalizedFilePath -> NFPMap v -> Maybe v
lookup nfp = HM.lookup (HashMapNFP nfp)
...
But maybe that's overkill.
since our filepaths often have long common prefixes (since they are absolute filepaths, all of them match for something like the first 20-30 characters)
Maybe it would be more efficient to compare them from end towards the beginning then? https://github.com/haskell/text/issues/670
Thank you for thinking about this in such detail! Regarding text, HLS bindists do not ship with simd support due to being hard to ship to users and relying on the necessary system dependencies.
I think we should most simply measure the difference it makes to change the equality instance. Perhaps you can simply change the instance or it doesn't do anything since we usually use HashMap anyway...
Does uri1 == uri2 imply nfp1 == nfp2?
I would have thought so, but I am not sure. Similarly, I am not sure nfp1 == nfp2 implies uri1 == uri2, as I dont know how symbolic links are handled.
Regarding
text, HLS bindists do not ship with simd support due to being hard to ship to users and relying on the necessary system dependencies.
I think you are mixing text +simdutf (which is a C++ library for SIMD-powered validation of UTF-8) and just general SIMD enabled. HLS disables the former but most likely not the latter (how could it?), and, say, memcmp uses a system-provided libc, which is most likely SIMD-powered.
Indeed, I am! Thank you very much for the clarification.
I've made a PR: https://github.com/haskell/lsp/pull/630
To benchmark this in HLS, should I just follow this guide? https://haskell-language-server.readthedocs.io/en/latest/contributing/contributing.html#benchmarks
yes 🙈 I actually have no idea though, how up-to-date this is