synapse
synapse copied to clipboard
User directory search doesn't work when searching for user IDs.
Description
The spec states that the user directory search should be "performed case-insensitively on user IDs and display names preferably using a collation determined based upon the Accept-Language header provided in the request, if present".
The user directory works by performing a full text search on text vectors extracted from user IDs, for example:
synapse=# select * from user_directory_search limit 5;
user_id | vector
-------------------------------+---------------------------------------------
@negritoofcourse:matrix.org | 'matrix.org':2 'negritoofcours':1A,3B
@freenode_jin1200:matrix.org | 'freenod':1A 'jin1200':2A,4B 'matrix.org':3
@freenode_Xenguy__:matrix.org | 'freenod':1A 'matrix.org':3 'xenguy':2A,4B
@morethanabitoff:matrix.org | 'matrix.org':2 'morethanabitoff':1A,3B
@curtisthe:matrix.org | 'curtisth':1A,3B 'matrix.org':2
(5 rows)
The query doesn't seem to take into account searching for user IDs: https://github.com/matrix-org/synapse/blob/ed630ea17c40d328cc0796e35d37287768c7140d/synapse/storage/data_stores/main/user_directory.py#L718-L773
(The user_id
bit at the top only being for excluding the current user from search results and/or user whom the requester does not search a public room with, depending on the value of user_directory_search_all_users
).
The query should be updated to search by the user ID.
Additionally, we don't do anything with respect to "preferably using a collation determined based upon the Accept-Language header provided in the request, if present", but as we're just searching user ID parts here, I'm not sure how that applies.
The tsquery
bit specifies 'english'
. Perhaps we should be modifying that based on Accept-Language
headers.
Synapse v1.14.0.
Related: https://github.com/matrix-org/synapse/issues/3631
I'm wondering if Synapse breaks up searching for @bob:matrix.org
into bob matrix.org
, which fails for some reason. Interestingly searching for @bob:matrix
works fine.
Looks like we don't handle .
correctly.
The user ID gets added to the tsvector
as:
erikj=# select to_tsvector('@erikj:jki.re');
to_tsvector
----------------------
'erikj':1 'jki.re':2
(1 row)
However, we convert a search for @erikj:jki.re
into '(erikj:* | erikj) & (jki:* | jki) & (re:* | re)'
, which doesn't match:
erikj=# select to_tsvector('@erikj:jki.re') @@ to_tsquery('english', '(erikj:* | erikj) & (jki:* | jki) & (re:* | re)');
?column?
----------
f
(1 row)
Maybe our parsing of the user query into postgres query needs some rework?
Maybe our parsing of the user query into postgres query needs some rework?
Related: https://github.com/matrix-org/synapse/issues/7590
I wonder if this is still a problem?
$ curlie 'https://vector.modular.im/_matrix/client/r0/user_directory/search' -X POST -H 'Authorization: Bearer [...]' --d '{"search_term":"@babolivier:vector.modular.im"}'
HTTP/2 200
date: Thu, 14 Oct 2021 17:35:41 GMT
content-type: application/json
content-encoding: gzip
cache-control: no-cache, no-store, must-revalidate
access-control-allow-origin: *
access-control-allow-methods: GET, HEAD, POST, PUT, DELETE, OPTIONS
access-control-allow-headers: X-Requested-With, Content-Type, Authorization, Date
strict-transport-security: max-age=15724800; includeSubDomains
permissions-policy: interest-cohort=()
{
"limited": false,
"results": [
]
}
Looks like it.
(this is on a homeserver running Synapse v1.44.0)
In Synapse 1.97.0 search is "completely broken" for me. None of the most obvious searches works. Searching by local part or full identity doesn't work. @ident:server.org is not found when searching for "ident", "ident:server.org", "@ident:server.org" Coincidentally, for some user, a search by local part suddenly works. But most users are simply not found.
I think ideally we would have a separate field in the message search request which takes a user ID to filter search results, rather than converting user IDs to text vectors and hoping postgres surfaces them correctly.
You could add a special search only by user_id local part or matrix id or something. But that's not really the point. The API is fine as is and as specified by the Matrix Specification. It should just work properly... ;-)
This document recommends to use regexp instead of wildcard in sql statement. May I expect about some solution?