unison
                                
                                 unison copied to clipboard
                                
                                    unison copied to clipboard
                            
                            
                            
                        Error Messages: on name resolution failure, suggest similar names
Overview
This change improves error messages when the user provides a name in Unison source code files that does not exactly match a known name in the codebase or within the file. Specifically, it suggests that the user try one of the top 3 closest name matches by Levenshtein edit distance. This is intended to aid users in more quickly resolving typos and similar errors.
The PR introduces a few new suggestion categories that align with this fuzzy matching behaviour. Suggestions for matches with a similar name and the right type are presented first if possible. Otherwise, suggestions for similar name but wrong type are presented.
For example,
-- Should match truncate and truncate0 though they have different types
myFunction : Float -> Nat
myFunction = Flat.TRuncate2x4
  Loading changes detected in scratch.u.
  I couldn't figure out what Flat.TRuncate2x4 refers to here:
      3 | myFunction = Flat.TRuncate2x4
  I found one or more terms in scope with similar names but the wrong types.
  If you meant to use one these, try using it instead and then adjusting types:
  truncate : Float -> Int
  truncate0 : Int -> Nat
Changes to Error Message Preambles
Additionally, suggestions in error messages on name resolution failure now use plurality-agnostic messages for simplicity and consistency. For example we now have,
  I found one or more terms in scope with similar names and the right types.
  If you meant to use one of these, try using it instead:
For example, this simplifies some of the error message code from:
    wrongTypeText pl =
      Pr.paragraphyText
        ( mconcat
            [ "I found ",
              pl "a term" "some terms",
              " in scope with ",
              pl "a " "",
              "matching name",
              pl "" "s",
              " but ",
              pl "a " "",
              "different type",
              pl "" "s",
              ". ",
              "If ",
              pl "this" "one of these",
              " is what you meant, try using its full name:"
            ]
        )
        <> "\n\n"
to
  rightNameWrongTypeText _ =
    mconcat
      [ "I found one or more terms in scope with the ",
        Pr.bold "right names ",
        "but the ",
        Pr.bold "wrong types.",
        "\n",
        "If you meant to use one of these, try using it with its full name and then adjusting types",
        ":\n\n"
      ]
Relevant Issues
Partially addresses #713.
Implementation notes
How does it accomplish it, in broad strokes? i.e. How does it change the Haskell codebase?
- Updates Unison.FileParsers, functioncomputeTypecheckingEnvironment, to produce aTypechecker.Envwith a new fieldfreeNameToFuzzyTermsByShortName.- The field freeNameToFuzzyTermsByShortNameis a mapping from the free names in the Unison file that needs to be type-checked to a mapping from short names to terms (similar to the fieldtermsByShortname).- We need to group the terms by free name so that we don't mix fuzzy matches between one free name and another giving poor suggestions.
 
- Adds function fuzzyFindByEditDistanceRankedused to produce fuzzy matches ranked by edit distance. This function uses two new functions inUnison.Names,queryEditDistancesandqueryEditDistances'that search throughNamesandSet Namerespectively to get edit distnaces.
 
- The field 
- Updates Unison.Typecheckersuch that theresolvefunction used byresolveNoteand in turn bytypeDirectedNameResolutionto produce and append fuzzy match suggestions to theResolutiondata type.- Suggestion types were expanded in Unison.Typechecker.Contextto include theSimilarNameRightTypeandSimilarNameWrongTypecases.
 
- Suggestion types were expanded in 
- Updates Unison.PrintErrorto handle printing of the new suggestion types.
Interesting/controversial decisions
1
I chose fuzzy matching by edit distance because it is usually more appropriate than FZF-style fuzzy finding for suggesting typos or other user errors. I tried FZF-style matching initially since existing code used it, but it would sometimes produce strange matches. Additionally, the FZF-style matching code is significantly slower and currently includes a prefilter that restricts matching to substrings and is not as useful.
2
A tricky piece of improving fuzzy match results was preprocessing names in the search space to the last N name segments where N is the number of name segments in the free name the user provided. This helped reduce edit distance bias towards shorter, fully-qualified names. For example, a deeply nested name like foo.bar.biz.bite would be deprioritized compared to a name like byz.byte if the user tried to use bete or baz.bete despite the edit distance being the same when considering only the last N segments. This could perhaps be improved by also considering aliases and qualified imports, but the adjustment here proved relatively simple and effective.
This behaviour is covered by one of the transcript tests.
3
Somewhat arbitrarily, I chose to show only the top 3 matches by edit distance.
Additionally, I decided to filter out matches with large edit distance. Here I made the decision after some manual testing to go with the following max score formula:
    -- Expect edit distances (number of typos) to be about half the length of the name being queried
    -- But clamp max edit distance to work well with very short names
    -- and keep ranking reasonably fast when a verbose name is queried
    maxScore = clamp (3, 16) $ Text.length (nameToText name) `ceilingDiv` 2
4
There may be some unintended impact to the dependence of some transcript tests to the global namespace. When the global namespace changes, fuzzy matching results could change and this may change transcript test outputs. This will usually change in ucm transcripts with the :error directive so the change should be automatically captured, but needs to be reviewed.
Test coverage
Some existing transcript tests cover name resolution failure error messages.
A new transcript name-resolution-fuzzy.md was added to more thoroughly cover all fuzzy name resolution scenarios.
As noted in this transcript, there is another case where the term matches with terms with the wrong name but the right type. This case was not added here because it is unclear how to produce a concrete example. This kind of suggestion does not seem to be produced in the code paths in Unison.TypeChecker.
Loose ends
1
It would be nice to have LSP code actions for accepting suggestions. This should be a separate ticket, and the design here hopefully doesn't conflict with that goal, but I don't have enough experience in the codebase to know.
2
It would be nice if LSP error reporting did not stop on the first error in the file. Although in the UCM it may be overwhelming to print all errors, it is useful while working on source code to be able to see all error hints so the user can work through errors in an order that works for them.
3
When name resolution fails it normally prints
  I also don't know what type it should be.
  Some common causes of this error include:
    * Your current namespace is too deep to contain the
      definition in its subtree
    * The definition is part of a library which hasn't been
      added to this project
    * You have a typo in the name
However, when suggestions are available this isn't printed. If none of the suggestions make sense, then it might be good to for the user to be shown more general guidance, perhaps after the suggestions.
4
This PR addresses some of the situations documented by @seagreen in #713, but not all of them.
Example 1 - Local Name Resolution (Directly Addressed)
aaaaa : Boolean
aaaaa = true
f : ()
f =
  if aaaaA then () else ()
Now gives the more helpful suggestions:
  I couldn't figure out what aaaaA refers to here:
      6 |   if aaaaA then () else ()
  I found one or more terms in scope with similar names and the right types.
  If you meant to use one of these, try using it instead:
  aaaaa : Boolean
Example 2 - Type Symbol Resolution (Not Addressed)
The following case where the name of the type is wrong (Bool instead of Boolean) is not addressed here as type symbol resolution seems to occur before term resolution and needs additional work.
x : Bool
x = true
Example 3 - Global Name Resolution (Partially Addressed)
Unfortunately, some builtins like true don't seem to be readily found in the parsing environment and global namespace so the following provides some awkward suggestions:
x : Boolean
x = True
  I found one or more terms in scope with similar names but the wrong types.
  If you meant to use one of these, try using it instead and then adjusting types:
  Type : Type -> Link
  Pattern.run : Pattern a -> a -> Optional ([a], a)
  Scope.run : (Unit ->{Scope s, g} r) ->{g} r
However, many other global names can be found and this scenario usually works. See the example above with truncate in the overview.