openbazaar-go icon indicating copy to clipboard operation
openbazaar-go copied to clipboard

Inbound Message Scanner retries failures indefinitely

Open placer14 opened this issue 5 years ago • 5 comments

After analyzing the error logs being returned from TestPerformTaskInboundMessageScanner in core/inbound_message_scanner_test.go I found the PerformTask function to never omit records which produce errors during processing, only marking them as resolved if completing the task sucessfully.

This problem also indicates that our unit test does not provide any error output for PerformTask for feedback about worker failures, which was occurring in this unit test due and being missed.

I recommend the following improvements:

  • PerformTask should return errors and make error handling a function of the caller
  • The unit test should provide a fake logger which pipes logger calls into the test (Ex: log.Error will trigger a t.Error call with the error content or log.Info will similarly trigger t.Log. Appropriate log levels triggering appropriate test reporting.)
  • Fix the InboundMessageScanner.PerformTask logic to handle errors so they resolve to a final state instead of skipping over the record for them to repeat the same path on next Perform.

placer14 avatar Dec 05 '19 23:12 placer14

What is the direct effect of this problem?

hoffmabc avatar Dec 06 '19 00:12 hoffmabc

Worst case: Any errored message which gets processed by this could potentially be processed more than once depending on where the message processing fails. Not all of our message handling is designed to be idempotent so the results would be undefined.

Currently, the only messages which persist with an error are pb.Message_ORDER_COMPLETION messages, but there's is nothing limiting it to just this type.

Best case: We spend CPU cycles we don't need to... which will accumulate over time.

placer14 avatar Dec 06 '19 00:12 placer14

What would be an errored messaged in this case? How could a message be errored?

hoffmabc avatar Dec 06 '19 00:12 hoffmabc

Currently, the only messages which persist with an error are pb.Message_ORDER_COMPLETION messages, but there's is nothing limiting it to just this type.

Errored message means an OB message which is persisted with an error. This is the only case of it so far: https://github.com/OpenBazaar/openbazaar-go/blob/40f5ac4b53ce0c94bef7281cebe0072127a68634/net/service/handlers.go#L1241-L1244

placer14 avatar Dec 06 '19 00:12 placer14

Not all of our message handling is designed to be idempotent

@cpacia note for v3 - IMHO every single message without exception should be idempotent. With the fickle nature of this type of networking it's simply not possible for clients to always reliably know whether messages were received and IMHO it's just easier for clients, unless an ACK was clearly received, to be able to just re-send messages it even mildly suspect may have not been received and do so without fear of undesired results.

IMHO, it should just be an inherent part of the protocol that all messaging (offline or online) now and in the future is idempotent.

rmisio avatar Dec 06 '19 15:12 rmisio