Openfire icon indicating copy to clipboard operation
Openfire copied to clipboard

OF-2824: Addressing data consistency in caches of routing table

Open guusdk opened this issue 1 year ago • 5 comments

The distinct commits in this PR each intend to address a separate issue, where the first three commits are preparing the code for easier application of the fourth commit.

The over-arching goal of all commits is to make the implementation of RoutingTable less error prone (in context of client sessions/routes). it does so by:

  • adding strictness with regards to allowed null and bare vs. full JID usage
  • simplifying the code (doing away with an optimization that I suspect plays a role in the cause of observed data inconsistency)
  • replacing per-cache lock usage to usage of one lock, to guard the manipulation of three related caches. This intends to remove a potential synchronization issue.

guusdk avatar May 08 '24 12:05 guusdk

I've reviewed this commit by commit, and it seems sensible. On locks, do those locks that we no longer acquire (Users, AnonymousUsers) get locked and edited anywhere else? i.e. Could this be trading race conditions?

Fishbowler avatar May 09 '24 13:05 Fishbowler

Those smack tests though 😬

Fishbowler avatar May 09 '24 14:05 Fishbowler

do those locks that we no longer acquire (Users, AnonymousUsers) get locked and edited anywhere else?

No, I've checked (but cannot programatically ensure) that all interaction with those caches either use UsersSession's lock, or no lock at all (in cases were the cache interaction is not specific to one entry - I'm not sure how to improve on that).

guusdk avatar May 09 '24 20:05 guusdk

Those smack tests though 😬

I had introduced fail-fast behavior. Guess what: things started to fail rather fast. I was wrong to assume that the arguments used would always be of the expected type (a full JID rather than a bare JID).

There might be room to improve things, as the invocation with bare JIDs might be a symptom of an underlying issue, but that's potentially a larger refactoring - while it wouldn't fix the same problem in plugins or third-party code.

Given that it appears to be wide-spread, I've now modified to code to, instead of failing immediately, simply prevent an operation that would have a predictable null-result when a bare JID is provided.

guusdk avatar May 09 '24 21:05 guusdk

LGTM 👍

Fishbowler avatar May 10 '24 09:05 Fishbowler

Successfully created backport PR for 4.8:

  • #2471

github-actions[bot] avatar Jun 28 '24 14:06 github-actions[bot]