tsearch
tsearch copied to clipboard
Search algorithm
The initial PoC version of the search algorithm was very naive. And currently it's broken as part of the improvements in the extract.
Being able to search is the whole idea of the project, so this should be the main focus.
There are some insights about the Hoogle search algorithm in this episode of The Haskell Cast.
A few things to consider:
- Semantics of TS should be taken into consideration (e.g.
any
should prioritize matchingany
abut should also include everything else). - We should optimize the extracted data for search, currently is just on big array :smirk:
- Let's start with a naive version first that doesn't consider all the semantics (I don't think the goal should ever be to have all the semantics into consideration for search) but can produce some meaningful results already. And then iterate on small improvements.
- For this issue to be solved we don't need an implementation but just a design of the first version of the search algorithm.
There are some details about how Hoogle handles this here. What I get from that is that it uses an approach similar to the one used to compute the edit distance between two strings, but here the operations are things like "reorder the parameters" or "change some type".
I think doing something like that as a first approach should be enough. You can even have the same cost for each operation (later these costs can be hand-picked to get nicer orders).
Here are some notes/thoughts on the algorithm, will keep adding stuff there until I feel it makes sense to start with a PoC.
(cc) @fvictorio