magda icon indicating copy to clipboard operation
magda copied to clipboard

Should remove racing condition in hooks service when hook is in async mode

Open mwu2018 opened this issue 4 years ago • 0 comments

Describe the bug When a hook (e.g. a minion) is operating in async mode, the hooks service might process messages in wrong order, causing the hook's iswaitingforresponse always tested to be true therefore stopping the minion from processing records.

The expected message flow should be

  1. When minion interested records are created or updated, WebHookProcessor will send those records to the minion and wait for minion's response.
  2. If the minion is operating in async mode, the WebHookProcessor will receive deferResponse: true response from the minion. When processing this response, it will set iswaitingforresponse to true.
  3. When the minion finishes processing the received records, it will send ACK request to the hooks service. Then the hooks service will immediately set iswaitingforresponse to false and start sending next batch of records to the minion if there are any.

However, the implementation of the above message processing has a pitfall: iswaitingforresponse might be set in a reversed order because the deferResponse: true response is processed via a message queue while ACK is not. If WebHookProcessor does not process deferResponse: true fast enough, the ACK message might be processed before it, causing iswaitingforresponse to be incorrectly set to true forever.

To Reproduce Steps to reproduce the behavior:

  1. Create a minion and set it to async mode.
  2. Simulate slow processing (10 seconds): Before https://github.com/magda-io/magda/blob/7a55fe884a77842210c96e848f97bc9001a19975/magda-registry-api/src/main/scala/au/csiro/data61/magda/registry/WebHookProcessor.scala#L259, insert Thread.sleep(10000).
  3. Create some new records that will be automatically sent to the minion, make sure a batch of records can be processed by the minion in less than 10 seconds.
  4. The registry log will report that the minion is busy. The hooks table will show that iswaitingforresponse is true and never change. The rest of records will not be processed.

Expected behavior All records should be processed eventually.

Technical Note A possible solution is to process ACK message in the same message queue as deferResponse: true, which guarantees deferResponse: true will be processed before ACK. Therefore iswaitingforresponse can be set correctly.

mwu2018 avatar Sep 30 '20 02:09 mwu2018