nsq icon indicating copy to clipboard operation
nsq copied to clipboard

Loss message when scale up nsqd

Open penghuazhou opened this issue 9 months ago • 2 comments

case: topic: topic1 have two channel channel1 and channe2. if i scale up one nsqd, producer and channel1 connect to nsqd, channel1 will consume topic1 data, channe2 connect to nsqd after 10 second, channe2 will lose 10 second data.

solve: i think we should let all consumer connect to nsqd before producer connect to nsqd. But if nsqd do not have topic, nsqlookup do not have nsqd info, so consumer can not found nsqd, only producer connect to nsqd then nsqlookup have nsqd info.

penghuazhou avatar Mar 16 '25 03:03 penghuazhou

If the topic and channels already exist on another nsqd, then when a new nsqd gets the first message of a new topic, it should fetch all channels that should exist from nsqlookupd, and create them immediately before the first message flows through. I'm not sure where this is documented, but it's been a feature since near the beginning ...

https://github.com/nsqio/nsq/blob/3103474b6c5afe8feca0a598797e82c06ed726e3/nsqd/nsqd.go#L508-L533

ploxiln avatar Mar 17 '25 02:03 ploxiln

@ploxiln thanks, i find the feature your recommend.

But nsq consumer can find new nsqd from nsqlookupd after 60s later default, so message perhaps consumer 60s later when producer first produce message.

I think we should let all consumer connect to nsqd before producer connect to nsqd.

https://github.com/nsqio/go-nsq/blob/326de60b740b53003cbdc7c477f06f5bedaa521e/config.go#L110

	// Duration between polling lookupd for new producers, and fractional jitter to add to
	// the lookupd pool loop. this helps evenly distribute requests even if multiple consumers
	// restart at the same time
	//
	// NOTE: when not using nsqlookupd, LookupdPollInterval represents the duration of time between
	// reconnection attempts
	LookupdPollInterval time.Duration `opt:"lookupd_poll_interval" min:"10ms" max:"5m" default:"60s"`

penghuazhou avatar Mar 17 '25 08:03 penghuazhou