ement.el icon indicating copy to clipboard operation
ement.el copied to clipboard

Constrain the ement-room--event-mentions-user-p regexp

Open phil-s opened this issue 2 years ago • 15 comments

At present the word uphill in a message is treated as mentioning the username phil :)

Let's at minimum wrap the pattern with word boundaries?

(Or do we then worry about usernames which do not begin/end with word characters? Either way, I'm sure we can improve this...)

phil-s avatar Nov 07 '23 00:11 phil-s

Makes sense. Thanks.

alphapapa avatar Nov 07 '23 17:11 alphapapa

I looked to see if there were restrictions on characters used for a user's display name, and I'm not seeing anything. Searching for displayname at https://spec.matrix.org/v1.8/client-server-api/ gives lots of info, but it seems like the value may be "any string".

So we can't simply wrap the quoted display name with word boundaries, but I suspect that even if the user had non-word characters at the start or end of their display name, we could assume that any mention of their name will have non-word characters (or bol/eol) surrounding the display name. So maybe something like this:

"\\(?:^\\|[^[:word:]]\\)\\(DISPLAYNAME\\)\\(?:$\\|[^[:word:]]\\)"

Possibly in the form of a rx-based template in a new variable or user option.

phil-s avatar Nov 26 '23 12:11 phil-s

It's kind of hidden here: https://spec.matrix.org/v1.8/appendices/#user-identifiers

alphapapa avatar Nov 26 '23 16:11 alphapapa

Ah, yes, we have different rules for username and displayname (I was only thinking about displayname).

At present ement-room--event-mentions-user-p uses the same code for both. In fact it matches on three names...

    (or (matches-body-p (ement-user-username user))
        (matches-body-p (ement--user-displayname-in room user))
        (matches-body-p (ement-user-id user)))))))

Where username is the "Username part of user's Matrix ID." in the user struct; so I guess we apply the character constraints from your spec URL to both ement-user-username and ement-user-id, and only ement--user-displayname-in should be the more permissive one.

phil-s avatar Nov 26 '23 23:11 phil-s

IIUC, doing this would fix the issue for both username and useid, right?

    (or (matches-body-p (format "\\b%s\\b" (ement-user-username user)))
        (matches-body-p (ement--user-displayname-in room user))
        (matches-body-p (format "\\b%s\\b" (ement-user-id user))))))))

Can we do this first and deal with the user displayname after?

Konubinix avatar Nov 28 '23 09:11 Konubinix

About the displayname, it sounds like there is no silver bullet.

For example, I think the displayname ":-) me :-)" can be considered a mention in "Hello :-) me :-)" but also in "Hello:-) me :-)".

But the displayname ". Therefore" would be matched against "Bla bla. Therefore bla bla", which seems counterintuitive.

I guess that to deal with all cases, we should provide a custom fonction that the user may define to match per own displayname, and provide one by default. My opinion would be to provide one that keep the current behavior.

Konubinix avatar Nov 28 '23 10:11 Konubinix