malamute icon indicating copy to clipboard operation
malamute copied to clipboard

Problem: malamute lose stream message when worker left

Open vyskocilm opened this issue 9 years ago • 5 comments

The test code is at https://github.com/eaton-bob/alice

How to build

gcc -lmlm -lczmq -lzmq src/worker.c -o worker gcc -lmlm -lczmq -lzmq src/producer.c -o producer

How to reproduce

  1. start malamute
  2. ./worker
  3. ./worker
  4. /producer

worker1 reader_cmd:SERVICE DELIVER,reader_sender:METRICS_SERVICE,subject:measurement.power@ups1,content:1000W reader_cmd:SERVICE DELIVER,reader_sender:METRICS_SERVICE,subject:measurement.th.ambiante@rc,content:50%

worker2 reader_cmd:SERVICE DELIVER,reader_sender:METRICS_SERVICE,subject:measurement.th.temperature@rc,content:10F

  1. kill worker2
  2. ./producer

worker1 got only ONE message reader_cmd:SERVICE DELIVER,reader_sender:METRICS_SERVICE,subject:measurement.th.temperature@rc,content:10

subsequent calls for producer seems to produce 3 messages.

Gerald's comment: When a worker disconnects from malamute broker, malamute dispatches message(s) to this dead worker one time and then evict it and it doesn’t try to send the lost message(s) to another alive worker.

vyskocilm avatar Oct 26 '15 13:10 vyskocilm

It takes some time for the broker to detect a dead client. Until that time, the broker may forward messages to this client. This is what you observe.

There is a "server/timeout" configuration parameter to set the time after which an idle client is regarded as dead. Tuning this parameter, you can limit the window size when a dead client consumes messages. But you must ensure that there is enough traffic so that alive client is not falsely regarded as dead.

hurtonm avatar Jul 07 '16 10:07 hurtonm

Hi,

thanks for investigating @hurtonm.

However this kind of race renders the SERVICE pattern as very unreliable. I'd say that the point of the broker is to ensure that - once it gets the message from client - reply will be delivered back to the client. So broker must tracks the sent requests and resent them to an another client if the previous one dissapeared.

@hintjens - as the architect of malamute - do you agree?

vyskocilm avatar Jul 11 '16 11:07 vyskocilm

Since service requests are sent in advance, dead workers will lose them (whatever is in their incoming buffers).

It's probably a bad idea to add retry logic to the broker and the service delivery protocol. Rather, clients that don't get a reply within X (milli)seconds should resend their requests. They can then detect/eliminate duplicate replies via the tracker (do replies copy something else from requests, e.g. a request id or tracker?) Workers must then be idempotent, e.g. same request twice has no further effect.

On Mon, Jul 11, 2016 at 1:38 PM, Michal Vyskocil [email protected] wrote:

Hi,

thanks for investigating @hurtonm https://github.com/hurtonm.

However this kind of race renders the SERVICE pattern as very unreliable. I'd say that the point of the broker is to ensure that - once it gets the message from client - reply will be delivered back to the client. So broker must tracks the sent requests and resent them to an another client if the previous one dissapeared.

@hintjens https://github.com/hintjens - as the architect of malamute - do you agree?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zeromq/malamute/issues/89#issuecomment-231710837, or mute the thread https://github.com/notifications/unsubscribe/AASzCBSbQ97PHnLLUqtxmDM_GziXUGdZks5qUitPgaJpZM4GVpuO .

hintjens avatar Jul 11 '16 14:07 hintjens

Hi pieter,

Thanks for your answer. It sounds plausible, even if I have hoped for more magic in the broker.

We may document this somewhere anyway... Dne 11. 7. 2016 4:21 PM napsal uživatel "Pieter Hintjens" < [email protected]>:

Since service requests are sent in advance, dead workers will lose them (whatever is in their incoming buffers).

It's probably a bad idea to add retry logic to the broker and the service delivery protocol. Rather, clients that don't get a reply within X (milli)seconds should resend their requests. They can then detect/eliminate duplicate replies via the tracker (do replies copy something else from requests, e.g. a request id or tracker?) Workers must then be idempotent, e.g. same request twice has no further effect.

On Mon, Jul 11, 2016 at 1:38 PM, Michal Vyskocil <[email protected]

wrote:

Hi,

thanks for investigating @hurtonm https://github.com/hurtonm.

However this kind of race renders the SERVICE pattern as very unreliable. I'd say that the point of the broker is to ensure that - once it gets the message from client - reply will be delivered back to the client. So broker must tracks the sent requests and resent them to an another client if the previous one dissapeared.

@hintjens https://github.com/hintjens - as the architect of malamute - do you agree?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zeromq/malamute/issues/89#issuecomment-231710837, or mute the thread < https://github.com/notifications/unsubscribe/AASzCBSbQ97PHnLLUqtxmDM_GziXUGdZks5qUitPgaJpZM4GVpuO

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zeromq/malamute/issues/89#issuecomment-231748972, or mute the thread https://github.com/notifications/unsubscribe/AIa-1LfYFpCJz_3uufuI0VJbC-5CcpPnks5qUlF-gaJpZM4GVpuO .

vyskocilm avatar Jul 11 '16 15:07 vyskocilm

You basically need magic, in the client and worker APIs. End-to-end principle. It comes to less work overall, and is invisible to the user.

On Mon, Jul 11, 2016 at 5:59 PM, Michal Vyskocil [email protected] wrote:

Hi pieter,

Thanks for your answer. It sounds plausible, even if I have hoped for more magic in the broker.

We may document this somewhere anyway... Dne 11. 7. 2016 4:21 PM napsal uživatel "Pieter Hintjens" < [email protected]>:

Since service requests are sent in advance, dead workers will lose them (whatever is in their incoming buffers).

It's probably a bad idea to add retry logic to the broker and the service delivery protocol. Rather, clients that don't get a reply within X (milli)seconds should resend their requests. They can then detect/eliminate duplicate replies via the tracker (do replies copy something else from requests, e.g. a request id or tracker?) Workers must then be idempotent, e.g. same request twice has no further effect.

On Mon, Jul 11, 2016 at 1:38 PM, Michal Vyskocil < [email protected]

wrote:

Hi,

thanks for investigating @hurtonm https://github.com/hurtonm.

However this kind of race renders the SERVICE pattern as very unreliable. I'd say that the point of the broker is to ensure that - once it gets the message from client - reply will be delivered back to the client. So broker must tracks the sent requests and resent them to an another client if the previous one dissapeared.

@hintjens https://github.com/hintjens - as the architect of malamute - do you agree?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zeromq/malamute/issues/89#issuecomment-231710837, or mute the thread <

https://github.com/notifications/unsubscribe/AASzCBSbQ97PHnLLUqtxmDM_GziXUGdZks5qUitPgaJpZM4GVpuO

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zeromq/malamute/issues/89#issuecomment-231748972, or mute the thread < https://github.com/notifications/unsubscribe/AIa-1LfYFpCJz_3uufuI0VJbC-5CcpPnks5qUlF-gaJpZM4GVpuO

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zeromq/malamute/issues/89#issuecomment-231779473, or mute the thread https://github.com/notifications/unsubscribe/AASzCK66seB-BviJCqPDHYWP43_RQngzks5qUmhhgaJpZM4GVpuO .

hintjens avatar Jul 11 '16 16:07 hintjens