Prozess
Prozess copied to clipboard
'This socket has been ended by the other party' error is not handled
I had an issue of Prozess not reconnecting if I take Kafka down. Debugging through the code it seems like 'This socket has been ended by the other party' error I was getting is not handled in Producer.js:79. and callback is being called without an attempt to "_reconnect".
Am I missing anything?
Thanks, Vladimir
I don't think you're missing anything. This sounds like a legitimate issue (that we haven't seen yet in production). Thanks for logging it.
Hmmm... on second thought, I just tested this (with produce_test.js), and I can't reproduce the issue. In your scenario, a subsequent call to Producer.send() will result in a failed write, which is caught in Producer.js:78 and as far as I can tell should always eventually be a "This socket is closed" error. I can imagine that there's a possibility that 'This socket has been ended by the other party' might be thrown mid-send, but a subsequent call to send() should still reconnect it. This seems to be the case in my tests at least. Obviously a failed send needs to be retried in either case.
The call to _reconnect exists for when the server comes back up. It's not actually trying to make sure your original send() works. It's just trying to ensure that you can keep retrying send() without having to manually reconnect(). You'll have to write your own resend logic in the case of errors.
Does that make sense?
Gregg,
Thanks for the quick response. It does make sense but unfortunately (observing the flow in debugger) I keep getting "This socket has been ended by the other party" on subsequent retries, so had to add it as OR condition on 79 and it works great now.
I was only taking Kafka nodes down and keeping zookeeper up if it makes any difference. Could it be version related:
- Kafka - 0.7.2
- Zookeeper - 3.3.4
- Node - v0.10.15
Thanks, Vladimir ----- Original Message ----- From: "Gregg Caines" [email protected] To: "cainus/Prozess" [email protected] Cc: "AlmaLOGIC" [email protected] Sent: Monday, September 9, 2013 2:11:23 PM Subject: Re: [Prozess] 'This socket has been ended by the other party' error is not handled (#39)
Hmmm... on second thought, I just tested this (with produce_test.js), and I can't reproduce the issue. In your scenario, a subsequent call to Producer.send() will result in a failed write, which is caught in Producer.js:78 and as far as I can tell should always eventually be a "This socket is closed" error. I can imagine that there's a possibility that 'This socket has been ended by the other party' might be thrown mid-send, but a subsequent call to send() should still reconnect it. This seems to be the case in my tests at least. Obviously a failed send needs to be retried in either case.
The call to _reconnect exists for when the server comes back up. It's not actually trying to make sure your original send() works. It's just trying to ensure that you can keep retrying send() without having to manually reconnect(). You'll have to write your own resend logic in the case of errors.
Does that make sense?
— Reply to this email directly or view it on GitHub .
Well first off, don't get me wrong: if it's a real bug, we want to patch it. :) Also, I believe what you're saying is happening.
What I'm saying though is that the send() does not need to reconnect in that scenario in order for this to work. You just need to send() again in your client code (possibly multiple times). The reconnect code will run when the "This socket is closed" error eventually fires.
I think (and I could be wrong) that this is only an issue because you're using a debugger and you expect it to recover immediately, whereas when the broker is down in real life, you would be retrying your send() constantly, in which the existing code would be fine. Conversely, it is highly unlikely (without a debugger to freeze time) that the broker would be available for a reconnect right after "This socket has been ended by the other party" gets thrown.
Do you understand what I mean? Let me know if you do, and I'm still missing something. Thanks!