nsq
nsq copied to clipboard
nsqd: stops sending messages after moving time backward
Environment/Pre-Conditions nsqd version: 1.1.0 Node.js application with nsqjs
Steps to Reproduce:
- Run nsqd and Node.js application.
- Generate some messages. Everything works correctly.
- Move time backward (for ex.: 5 minutes or 1 hour).
- Generate some messages. Subscriber does't receive messages. 5.1. After some time (may be delay from (3)?) message will be published. 5.2. If I move time forward then messages will be published immediately.
Actual Result:
Messages aren't sent to subscriber after moving time back.
Expected Result:
Messages are sent to subscriber after moving time back.
I generally expect various things to not work right when system time moves backwards :)
Go-1.9 added "transparent monotonic time" in certain cases: https://golang.org/doc/go1.9#monotonic-time There may be ways we can adjust nsq code to get the monotonic-time based comparisons and timeouts ... if nsqd is the problem here rather than the nodejs client library ...
I think It is certainly nsqd because nodejs restart didn’t help.
Can I help with this issue? Maybe you can prompt modules and methods names which should be refactored.
thanks, we'll look into it
since Go does not use separate fuctions/options/objects for monotonic time, but includes it hidden in the wall-clock time struct, it's probably a bit tricky to figure out where it's being lost, or how to keep it along where needed
I'd be surprised if "normal" channel sends were the problem, so I'd start by looking at other timeout related things. The two that come to mind are:
- network connection level timeouts
-
time.Ticker
use cases, e.g. to flush buffered messages to a client
Sorry for bothering but is there any progress? Our team has some resources to investigate this behaviour, i. e. you can simply point potential problem places. We want to help.
No progress.
I suspect the issue may be around https://github.com/nsqio/nsq/blob/master/nsqd/guid.go#L59
I suspect the issue may be around https://github.com/nsqio/nsq/blob/master/nsqd/guid.go#L59
Ahhh, yes, I completely forgot about the GUID code, good call!
I had been poking around at all the network deadlines and tickers, but I'm pretty confident they're not the issue.
Can we add to guid
some clock sequence
(14 bit). This section helps with backwards time travelling. More here:
https://blog.stephencleary.com/2010/11/few-words-on-guids.html
Hmmm, I'm not sure we have space for that.
I think we can just use a time duration rather than timestamp, and Go will transparently handle the monotonicity for us.
The last time we visited this issue was #658 / #663
We didn't want to drastically change the GUID generation algorithm. Although we'd like to say the only guarantee is that IDs are unique within a particular consumer connection to nsqd, it's possible that some users may implicitly depend on the uniqueness within a channel across multiple nsqd and restarts of nsqd.
An alternate algorithm that would lose some of that across-sources-and-restarts uniqueness, but be compatible with the odd and inadvisable condition of time going backwards: start at a random initial value, and just increment and wrap (similar to tcp sequence numbers but 64-bit).
Hmmm, I do vaguely remember discussing this and being frustrated about the "backwards compatibility".