Complement `Existing members see new members' presence` is flaky
It took me 363 runs (~3 hours of CI) to reproduce this, so maybe it's not the highest priority flake around.
The issue seems to be that Bob's presence information is not coming down sync.
- Complement test link: https://github.com/matrix-org/complement/blob/d784f7d96677e8b4267da9729d5c601c1e99e741/tests/csapi/rooms_members_local_test.go#L38-L61
- Equivalent SyTest link: https://github.com/matrix-org/sytest/blob/a94cd1dc2d6102e5a9e94659e2dca243e9b72208/tests/30rooms/02members-local.pl#L104-L117
- Logs for failing test: TestMembersLocal.log
I believe the reason we expect Bob's presence to come down sync is because /join involves sending an event and that should update his presence.
I can't see the code that should trigger this in Synapse, but since this succeeds most of the time it must be there.
In the test logs for this test, I didn't spot any requests to /_synapse/replication/presence_set_state/ for Bob, which I would have expected to see. (I do see some for Alice.)
For this reason, I hesitate to brush it off as 'slow replication'. I also note that the requests in the log don't seem to be that slow around the time of the failure, so I don't suspect CPU contention.
Seen again at https://github.com/matrix-org/synapse/runs/7541314250?check_suite_focus=true#step:4:3959 (on a monolith sqlite deployment?)
https://github.com/matrix-org/synapse/runs/7742123843?check_suite_focus=true
maybe this is happening more frequently now? Also seen at https://github.com/matrix-org/synapse/runs/7783161292?check_suite_focus=true (on a monolith)
keyword: TestMembersLocal/Parallel/Existing_members_see_new_members'_presence
https://github.com/matrix-org/synapse/runs/8206808042?check_suite_focus=true
https://github.com/matrix-org/synapse/runs/8213472770?check_suite_focus=true
https://github.com/matrix-org/synapse/actions/runs/3080827578/jobs/4978642028#logs
https://github.com/matrix-org/synapse/actions/runs/3127323400/jobs/5073871243
https://github.com/matrix-org/synapse/actions/runs/3137293711/jobs/5095334318
https://github.com/matrix-org/synapse/actions/runs/3144138134/jobs/5109828689
https://github.com/matrix-org/synapse/actions/runs/3244698433/jobs/5321264235
https://github.com/matrix-org/synapse/actions/runs/3250081130/jobs/5333349627
https://github.com/matrix-org/synapse/actions/runs/3280866173/jobs/5402221795
https://github.com/matrix-org/synapse/actions/runs/3321575592/jobs/5489496717
https://github.com/matrix-org/synapse/actions/runs/3369210209/jobs/5588651779
https://github.com/matrix-org/synapse/actions/runs/3375799267/jobs/5602795377
https://github.com/matrix-org/synapse/actions/runs/3439211524/jobs/5736298006
https://github.com/matrix-org/synapse/actions/runs/3462005537/jobs/5780406423
On SQLite in monolith mode, the test fails reliably if modified so that Bob's join appears in an initial sync. Bob's presence is offline(!) and it's only sent during incremental syncs, so this appears to be an issue with the test setup, or Synapse not setting Bob's presence to online upon a join.
We appear to only bump presence when Bob sends a regular message. https://github.com/matrix-org/synapse/blob/86c5a710d8b4212f8a8a668d7d4a79c0bb371508/synapse/handlers/message.py#L1918-L1921
That condition has been around since https://github.com/matrix-org/synapse/commit/f70e622d59e7b97c539ee03ffc02315b4d626b00 from 2014.
However, https://github.com/matrix-org/synapse/commit/b31ec214a5c2bd674814b2d052200963c5e5cbd7 modified the bump to only transition from unavailable to online and changes nothing when a user is offline. So it seems that users intentionally do not go from offline to online when sending messages. https://github.com/matrix-org/synapse/blob/86c5a710d8b4212f8a8a668d7d4a79c0bb371508/synapse/handlers/presence.py#L964-L965