bot icon indicating copy to clipboard operation
bot copied to clipboard

Syncer and on_member_join race condition for adding user data to API.

Open sentry[bot] opened this issue 5 years ago • 7 comments
trafficstars

Sentry Issue: BOT-15

This seems to be a result of the syncer retrieving a list of users to add to the API after a member has joined but before the on_member_join listener event has finished adding the data itself to the API, only to have the syncer get around to attempting to add it a second time which is met with a 400 status and a notice that the user exists in the db already.

ResponseCodeError: Status: 400 Response: {'id': ['user with this id already exists.']}
  File "bot/cogs/sync/cog.py", line 43, in sync_guild
    total_created, total_updated, total_deleted = await syncer(self.bot, guild)
  File "bot/cogs/sync/syncers.py", line 218, in sync_users
    'roles': list(user.roles)
  File "bot/api.py", line 109, in post
    await self.maybe_raise_for_status(resp, raise_for_status)
  File "bot/api.py", line 83, in maybe_raise_for_status
    raise ResponseCodeError(response=response, response_json=response_json)

Task exception was never retrieved
future: <Task finished coro=<Sync.sync_guild() done, defined at /bot/bot/cogs/sync/cog.py:35> exception=ResponseCodeError()>

While we could consider just suppressing the exception, the specific exception is a ResponseCodeError. The exception given is vague without additional details and (I have not looked to confirm) may be raised for other reasons than a user ID conflict. The HTTP status code seems to infer that the request is invalid, which is another poorly unspecific response. The only way to see if it's due to a duplicate is via the raw message text returned in the request body, containing a casual explanation of "user with this id already exists". This content doesn't feel rock solid enough to base a reason check on, in the case that the text may change later in the site project at any time.

We should probably consider giving a more appropriate http response code for these types of unique resource violations so they're easily detected and handled, but that's outside of the scope of this issue for now and can be considered sometime further in the future in the site repo.

sentry[bot] avatar Feb 23 '20 16:02 sentry[bot]

Would there be a way to make the syncer wait for event handlers in the cog to finish first? Is that even a good idea? Would the opposite also be an issue i.e. an event handler trying to make a request to the API after the syncer already has?

MarkKoz avatar Feb 23 '20 17:02 MarkKoz

It would likely be easier for any update event listeners (such as on_member_join) to not proceed if it can tell a sync is in process already, perhaps with an asyncio.Lock or asyncio.Event to reference.

scragly avatar Feb 23 '20 17:02 scragly

When the syncer finishes, the listeners will need to know e.g. that the syncer already created the user so the listener shouldn't attempt to do it again. Otherwise, it'd result in the same error.

MarkKoz avatar Feb 23 '20 17:02 MarkKoz

Hmm, we could probably build some special method that can be awaited, and it returns a list of IDs. If it's in the middle of syncing, it'll wait till it's done and return the ID list it processed. If it isn't syncing at that time, it can immediately return with an empty list.

scragly avatar Feb 23 '20 17:02 scragly

Is this still an issue with the new "smart syncer" we've merged (#1165)?

Den4200 avatar Oct 25 '20 16:10 Den4200

It probably still is.

MarkKoz avatar Oct 26 '20 18:10 MarkKoz

There's a similar issue in the on_member_join method (https://python-discord.sentry.io/issues/5216780974) though it seems to be due to the on_member_join event being recieved twice.

I think the best solution in both of these cases is to silently ignore the error. We know in both cases the user was only just created so we can assume it's up to date, and relying on the database is much simpler and more reliable than working out our own locking solution.

IMO it's fine to rely on "already exists" being in the id field error message. A documented error code or status would be nicer, but I don't think the current message would change without consideration for places that may rely on it.

wookie184 avatar Apr 18 '24 12:04 wookie184