construct-hub-webapp icon indicating copy to clipboard operation
construct-hub-webapp copied to clipboard

The term 'aws' should be ignored by the search algorithm

Open addihorowitz opened this issue 3 years ago • 4 comments

Some visitors search for 'AWS codepipeline', they get 688 results. Most of them are not relevant.

If you search instead for 'codepipline' you get 27 relevant results.

addihorowitz avatar Dec 26 '21 13:12 addihorowitz

I'm not sure we can implement this without breaking a lot of other title-based searches such as aws-cdk or aws-solutions. Results at the top for "aws codepipeline" are still the most relevant results. I have a PR which explores the suggested solution, but in testing I've found the search experience to be much worse

gabewomble avatar Dec 29 '21 18:12 gabewomble

I believe there's a difference between "aws-X" and "aws X". The first is one term 'aws-X', the second term is two words: 'aws' and 'X'. If a user types the word "aws" (not the prefix "aws", but a term that equals to "aws") we should ignore it

addihorowitz avatar Jan 09 '22 14:01 addihorowitz

The issue is it conflicts entirely with how the search engine works. All search terms are "tokenized", meaning that they are separated into a list of segments. For example, if I search @aws-cdk/cloudfront static_site, the following tokens will be searched on: aws, cdk, cloudfront, static, site. Results will be returned by relevance based on field weights, meaning something like @aws-cdk/cloudfront would be returned before @aws-cdk/foo that includes cloudfront in its description

I will give this suggestion a try but I believe it will still cause problematic edge cases like before. I would also argue that we are getting acceptable and relevant results with the current behavior. Libraries that match the fields the strongest appear first, while looser matches have lower relevance scores. If you look at other search engines, it feels like the first 10-20% of results are strongly relevant, and beyond that point results are only tangentially related

gabewomble avatar Jan 10 '22 21:01 gabewomble

This issue is now marked as stale because it hasn't seen activity for a while. Add a comment or it will be closed soon. If you wish to exclude this issue from being marked as stale, add the "backlog" label.

github-actions[bot] avatar Mar 12 '22 01:03 github-actions[bot]