mosaic icon indicating copy to clipboard operation
mosaic copied to clipboard

Isolated incident of malicious workspace getting assigned twice

Open zjmiller opened this issue 5 years ago • 0 comments

Workspace c78c99cb-889e-4871-a2f2-0f2c61ab9930 was assigned twice. (I manually de-activated the second generated judge workspace to get the tree back on track.)

My guess is that: after the first user submitted an answer candidate, which succeeded, for some reason the attempt to mark the workspace as ineligible for scheduling (i.e., "not stale") failed. These are separate GraphQL requests, so there's no guarantee that if one succeeds the other must succeed, though this is the first case I'm aware of where this has occurred.

All of Heroku at the time was experiencing platform-wide issues (https://status.heroku.com/incidents/18920), which could have contributed to the fact that one request succeeded while another request failed.

Some things we should do:

  • Run SQL queries to help us know with more confidence that this issue has only occurred once:
    • a query looking for any expert workspace that (1) is eligible for scheduling and (2) has children
    • a query looking for any expert workspace that has >1 child
  • Users already get an alert if they are assigned to an expert workspace that has already submitted an answer candidate. This alert tells them to email me (this is how I figured out about this particular case). So far I've only gotten one email, and I'm aware of only one instance where this occurred. Better would be if we installed Sentry on the frontend and automatically logged anytime this situation occurs.
  • Look into restructuring the backend so that a single GraphQL request marks the workspace as ineligible for scheduling and submits the answer candidate (and potentially wrap everything in a transaction depending on how much we want to avoid the possibility of this error in the future). This is probably not worth a time-intensive refactor unless this situation occurs again, given that my current hypothesis that the Heroku-wide disruptions were responsible (and thankfully Heroku doesn't have issues like this very often).

zjmiller avatar Aug 31 '19 21:08 zjmiller