fluent-logger-golang icon indicating copy to clipboard operation
fluent-logger-golang copied to clipboard

Disordered Ack messages will be missed

Open tagomoris opened this issue 3 years ago • 3 comments

Currently, the write() function will wait for the ack message of the message sent. And reading ack messages from a connection will be processed in the order of written messages. If the series of ack messages are disordered (it can happen), arrived messages will be missed and the original messages will be re-sent to the server. It is not a critical problem (because sending messages are retried), but it may cause a problem about highly heavier traffic with at-least-once configuration.

A possible solution could be (solution is not only this way, of course):

  1. write() will set the msg.ack value to a Set (with locking)
  2. write() lets a goroutine read response messages from the connection
  3. the goroutine will read a message from any of connections passed, and remove an AckResp.Ack from the Set (with locking)
  4. write() will wait until the ack value will be removed from the Set (or timeout)
  5. if it timed out, write() will retry to send the message

tagomoris avatar Oct 29 '21 08:10 tagomoris

This problem will last even after merging #82

tagomoris avatar Oct 29 '21 08:10 tagomoris

@tagomoris Are you already working on this? :slightly_smiling_face:

About the implementation details, I have the following solution in mind that would be lock-free:

  1. Start an ackRecv() goroutine in newWithDialer when Ack == true
  2. Use a dedicated channel (eg. named acksCh) to communicate between write() and ackRecv()
  3. Create an ack channel for each message in write() to communicate back from ackRecv() to write()
  4. Send a struct containing the ack ID, the ack channel and bool (with true value) from write() to ackRecv() (before actually writing to the connection, to avoid race conditions)
  5. Block the write() until either the ack channel or a timeout channel got triggered i. If the the timeout channel got triggered, resend the same struct as 4. but with the bool turned false.
  6. Concurrently read from the acksCh and the socket i. When an "ack request" is received from acksCh put it in a local map[string]struct{} if the bool is true or remove it if the bool is false. ii. When an ack response is received from the socket, check if there's a matching ID in the local map and send a struct{} to the appropriate channel and remove it from the local map.
  7. When Close() is called, acksCh got closed i. Set a local shouldClose bool to true in acksRecv() ii. When the last ackMessage is received or removed from the local map (cf. 6.) and shouldClose is true, exit from acksRecv()

My only concern with this solution is regarding #104.

akerouanton avatar Oct 31 '21 12:10 akerouanton

I've not worked on this yet - no plans for now. @akerouanton Your idea was an alternative. I don't have enough knowledge about the creation cost of dedicated channels, so I can't evaluate which is better - but the idea of using channels seems a golang-ish solution.

tagomoris avatar Nov 01 '21 04:11 tagomoris