synapse icon indicating copy to clipboard operation
synapse copied to clipboard

User directory search doesn't work when searching for user IDs.

Open anoadragon453 opened this issue 4 years ago • 9 comments

Description

The spec states that the user directory search should be "performed case-insensitively on user IDs and display names preferably using a collation determined based upon the Accept-Language header provided in the request, if present".

The user directory works by performing a full text search on text vectors extracted from user IDs, for example:

synapse=# select * from user_directory_search limit 5;
            user_id            |                   vector                    
-------------------------------+---------------------------------------------
 @negritoofcourse:matrix.org   | 'matrix.org':2 'negritoofcours':1A,3B
 @freenode_jin1200:matrix.org  | 'freenod':1A 'jin1200':2A,4B 'matrix.org':3
 @freenode_Xenguy__:matrix.org | 'freenod':1A 'matrix.org':3 'xenguy':2A,4B
 @morethanabitoff:matrix.org   | 'matrix.org':2 'morethanabitoff':1A,3B
 @curtisthe:matrix.org         | 'curtisth':1A,3B 'matrix.org':2
(5 rows)

The query doesn't seem to take into account searching for user IDs: https://github.com/matrix-org/synapse/blob/ed630ea17c40d328cc0796e35d37287768c7140d/synapse/storage/data_stores/main/user_directory.py#L718-L773

(The user_id bit at the top only being for excluding the current user from search results and/or user whom the requester does not search a public room with, depending on the value of user_directory_search_all_users).

The query should be updated to search by the user ID.

Additionally, we don't do anything with respect to "preferably using a collation determined based upon the Accept-Language header provided in the request, if present", but as we're just searching user ID parts here, I'm not sure how that applies.

The tsquery bit specifies 'english'. Perhaps we should be modifying that based on Accept-Language headers.

Synapse v1.14.0.

anoadragon453 avatar May 28 '20 11:05 anoadragon453

Related: https://github.com/matrix-org/synapse/issues/3631

I'm wondering if Synapse breaks up searching for @bob:matrix.org into bob matrix.org, which fails for some reason. Interestingly searching for @bob:matrix works fine.

anoadragon453 avatar May 28 '20 11:05 anoadragon453

Looks like we don't handle . correctly.

The user ID gets added to the tsvector as:

erikj=# select to_tsvector('@erikj:jki.re');
     to_tsvector      
----------------------
 'erikj':1 'jki.re':2
(1 row)

However, we convert a search for @erikj:jki.re into '(erikj:* | erikj) & (jki:* | jki) & (re:* | re)', which doesn't match:

erikj=# select to_tsvector('@erikj:jki.re') @@ to_tsquery('english', '(erikj:* | erikj) & (jki:* | jki) & (re:* | re)');
 ?column? 
----------
 f
(1 row)

Maybe our parsing of the user query into postgres query needs some rework?

erikjohnston avatar Jun 04 '20 10:06 erikjohnston

Maybe our parsing of the user query into postgres query needs some rework?

Related: https://github.com/matrix-org/synapse/issues/7590

anoadragon453 avatar Sep 03 '20 15:09 anoadragon453

I wonder if this is still a problem?

DMRobertson avatar Oct 14 '21 17:10 DMRobertson

$ curlie 'https://vector.modular.im/_matrix/client/r0/user_directory/search' -X POST -H 'Authorization: Bearer [...]' --d '{"search_term":"@babolivier:vector.modular.im"}'

HTTP/2 200 
date: Thu, 14 Oct 2021 17:35:41 GMT
content-type: application/json
content-encoding: gzip
cache-control: no-cache, no-store, must-revalidate
access-control-allow-origin: *
access-control-allow-methods: GET, HEAD, POST, PUT, DELETE, OPTIONS
access-control-allow-headers: X-Requested-With, Content-Type, Authorization, Date
strict-transport-security: max-age=15724800; includeSubDomains
permissions-policy: interest-cohort=()

{
    "limited": false,
    "results": [
        
    ]
}

Looks like it.

(this is on a homeserver running Synapse v1.44.0)

babolivier avatar Oct 14 '21 17:10 babolivier

In Synapse 1.97.0 search is "completely broken" for me. None of the most obvious searches works. Searching by local part or full identity doesn't work. @ident:server.org is not found when searching for "ident", "ident:server.org", "@ident:server.org" Coincidentally, for some user, a search by local part suddenly works. But most users are simply not found.

finsterwalder avatar Nov 29 '23 08:11 finsterwalder

I think ideally we would have a separate field in the message search request which takes a user ID to filter search results, rather than converting user IDs to text vectors and hoping postgres surfaces them correctly.

anoadragon453 avatar Nov 29 '23 10:11 anoadragon453

You could add a special search only by user_id local part or matrix id or something. But that's not really the point. The API is fine as is and as specified by the Matrix Specification. It should just work properly... ;-)

finsterwalder avatar Nov 29 '23 17:11 finsterwalder

This document recommends to use regexp instead of wildcard in sql statement. May I expect about some solution?

bitfriend avatar Dec 11 '23 10:12 bitfriend