openvsx icon indicating copy to clipboard operation
openvsx copied to clipboard

Searching for exact ID is not reliable

Open filiptronicek opened this issue 2 years ago • 4 comments

When you search for the Jupyter^1 extension on Open VSX [direct search link], you will be met with the first result being CodeStream.codeStream. I believe this is because we treat extension namespaces and extension names separately, and the dot in the middle is preventing better search results.

Maybe we can add the extension id (namespace.extension) to the search criteria or try resolving ID-looking search queries directly.

image

filiptronicek avatar Apr 09 '23 12:04 filiptronicek

Maybe we can add the extension id (namespace.extension) to the search criteria

extensionId is part of the search criteria and has the highest boost.

            boolQuery.should(QueryBuilders.termQuery("extensionId.keyword", options.queryString).caseInsensitive(true)).boost(10);

            // Fuzzy matching of search query in multiple fields
            var multiMatchQuery = QueryBuilders.multiMatchQuery(options.queryString)
                    .field("name").boost(5)
                    .field("displayName").boost(5)
                    .field("tags").boost(3)
                    .field("namespace").boost(2)
                    .field("description")
                    .fuzziness(Fuzziness.AUTO)
                    .prefixLength(2);
            boolQuery.should(multiMatchQuery).boost(5);

            // Prefix matching of search query in display name and namespace
            var prefixString = options.queryString.trim().toLowerCase();
            var namePrefixQuery = QueryBuilders.prefixQuery("displayName", prefixString);
            boolQuery.should(namePrefixQuery).boost(2);
            var namespacePrefixQuery = QueryBuilders.prefixQuery("namespace", prefixString);
            boolQuery.should(namespacePrefixQuery);

Using #684 as a starting point, I think this happens because ms-toolsai.jupyter is not that frequently updated (2023-03-10T04:05:53.638673Z), making it possibly less relevant than codestream.codestream (2023-03-24T15:36:43.527142Z).

        var relevance = ratingRelevance * limit(ratingValue) + downloadsRelevance * limit(downloadsValue)
                + timestampRelevance * limit(timestampValue);

@filiptronicek Do you want me to check if this is a common issue for all exact ID searches?

amvanbaren avatar Apr 11 '23 09:04 amvanbaren

That's really interesting. Codestream has about 5K downloads, while Jupyter has about 800K - I'm trying to say maybe this could be taken into account as well, since people are more likely to search for more popular extensions.

Also found out that ms-toolsai/jupyter (note the / instead of .) gives back the correct result. Maybe Codestream is just odd with its metadata. I think we can keep this issue open if we bump into any other examples.

filiptronicek avatar Apr 11 '23 10:04 filiptronicek

It is taken into account, but freshness (timestamp) is prioritized over downloads. From https://github.com/EclipseFdn/open-vsx.org/blob/production/configuration/application.yml:

    relevance:
      rating: 0.2
      downloads: 1.0
      timestamp: 3.0

amvanbaren avatar Apr 11 '23 10:04 amvanbaren

It is taken into account, but freshness (timestamp) is prioritized over downloads. From https://github.com/EclipseFdn/open-vsx.org/blob/production/configuration/application.yml:

    relevance:
      rating: 0.2
      downloads: 1.0
      timestamp: 3.0

https://github.com/dgileadi/vscode-java-decompiler/issues/17#issue-1489959120

Mdnou avatar May 26 '23 05:05 Mdnou