nchan icon indicating copy to clipboard operation
nchan copied to clipboard

Subscribers doesn't receive message

Open ZinkevychSergii opened this issue 6 years ago • 10 comments

Frequently faced to issue when published messages appears at redis. but subscribers get some old (probably cached) message

upstream redis_cluster {
  nchan_redis_server redis://nchan-redis-server:6379;
  # you don't need to specify all the nodes, they will be autodiscovered
  # however, it's recommended that you do specify at least a few master nodes.
  nchan_redis_storage_mode ${NCHAN_REDIS_STORAGE_MODE};
  nchan_redis_nostore_fastpublish ${NCHAN_REDIS_NOSTORE_FASTPUBLISH};
}

  location ~ /nchan/sub/user/(\w+)$ {
    nchan_channel_id $1;
    nchan_subscriber;
    nchan_subscribe_request /upstream/sub;
    ${NCHAN_SUB_USER_UNSUBSCRIBE_REQUEST_COMMENT}nchan_unsubscribe_request /upstream/unsub;
    nchan_message_buffer_length ${NCHAN_SUB_USER_MESSAGE_BUFFER_LENGTH};
    nchan_subscribe_existing_channels_only ${NCHAN_SUB_USER_SUBSCRIBE_EXISTING_CHANNELS_ONLY};
    nchan_subscriber_first_message ${NCHAN_SUB_USER_SUBSCRIBER_FIRST_MESSAGE};
    nchan_channel_group nchan;
    nchan_redis_pass redis_cluster;
  }

  location = /upstream/sub {
    internal;
    proxy_set_header X-Subscriber-Type $nchan_subscriber_type;
    proxy_set_header X-Message-Id $nchan_message_id;
    proxy_set_header X-Channel-Id $nchan_channel_id;
    proxy_set_header X-Original-URI $request_uri;
    proxy_pass http://******/api/idp/bus/subscribe;
  }

env var: NCHAN_SUB_USER_UNSUBSCRIBE_REQUEST_COMMENT: '#' NCHAN_SUB_USER_MESSAGE_BUFFER_LENGTH: 1 NCHAN_SUB_USER_SUBSCRIBE_EXISTING_CHANNELS_ONLY: 'on' NCHAN_SUB_USER_SUBSCRIBER_FIRST_MESSAGE: 1 NCHAN_REDIS_NOSTORE_FASTPUBLISH: 'on' NCHAN_REDIS_STORAGE_MODE: 'distributed'

response headers

Connection: keep-alive
Content-Length: 964
Date: Fri, 31 May 2019 09:43:11 GMT
Etag: 10
Last-Modified: Fri, 31 May 2019 09:29:31 GMT
Server: nginx/1.15.0
Vary: If-None-Match, If-Modified-Since

one more thing. when i do publish to many message, redis seems to become corrupted. here is screen what insside Screen Shot 2019-05-31 at 6 30 06 PM

seems set nchan_redis_storage_mode to backup helps

ZinkevychSergii avatar May 31 '19 09:05 ZinkevychSergii

Any logged errors? What Nchan version? What is the message publishing rate per channel?

slact avatar Jun 04 '19 21:06 slact

logs pretty much good. on publish successful post request. on subscribe successfull get request nchan version v1.2.5 not sube about rate per channel, where could i found it?

probably more clue for you, when i set nchan_message_buffer_length to 0, issue hasn't reproduced. but i'm interesting to keep just 1 message at redis. for me set nchan_redis_storage_mode to backup mode works fine, i don't need any scaling

ZinkevychSergii avatar Jun 05 '19 05:06 ZinkevychSergii

any guesses here? i have 4 nchan container at k8s and they are connect to redis cluster. when i read stats from XXX/nchan_stub_status it gives mes

total published messages: 140126
stored messages: 65
shared memory used: 68K
shared memory limit: 131072K
channels: 76
subscribers: 295
redis pending commands: 0
redis connected servers: 9
total interprocess alerts received: 0
interprocess alerts in transit: 0
interprocess queued alerts: 0
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.2.5

after refresh page it gives totally different numbers. after a few refreshing i get same numbers from first request again. from my point of view nchan instances became corrupted and doesn't fetch info from redis

ZinkevychSergii avatar Jun 25 '19 20:06 ZinkevychSergii

Please check if this issue persists in version 1.2.6

slact avatar Jun 27 '19 15:06 slact

seems to be ok now. but nchan_stub_status still show different values of each request. probably works as design.

@slact one more question, probably not related to this issue. nginx_stub_status shows Active connections: 14723 server accepts handled requests 32150 32150 32881 Reading: 0 Writing: 14718 Waiting: 5

but i know that i have around 300 people online (mob app). and each of them subscribe to max 3 channel

ZinkevychSergii avatar Jul 02 '19 08:07 ZinkevychSergii

@slact Today, version 1.2.6(build time: 2019-7-22), one subscriber cannot receive messages, while other subscribers worked good. There were 30+ messages from pub in redis. I retried many times to make wss connections (same sub account) using chrome & wscat, but nchan DID NOT forward the latest messages to clients. After reloading nginx (/usr/sbin/nginx -s reload), this issue was solved temporarily. Can you give me some suggestions about root cause or configuration ?

xmglin avatar Aug 23 '19 07:08 xmglin

We have the same issue. I think there's a race condition, leading to the error:

  1. client subscribes to channel
  2. nchan starts fetching messages from redis cluster
  3. nchan_subscribe_request handler is called, which POSTs to the channel
  4. this somehow interrupts the message-loading from redis
  5. the channel is broken

This can be reproduced by configuring the nchan_subscribe_request to POST to the channel being subscribed to, in combination with redis(-cluster). There might be other ways to trigger this, too.

Once this starts happening, the redis connected servers and (sometimes?) redis pending commands value starts increasing, too.

neben avatar Sep 02 '19 19:09 neben

I'm looking into this

slact avatar Sep 04 '19 01:09 slact

when do publish, i can see post log 10.20.12.173 - - [19/Sep/2019:14:47:26 +0000] "POST /pub/a5ea6c6dcb56ce8e0ada4efcca11e9deafe851b273caa98c29d23b1ff8b34430?max_msg=1&ttl=0s HTTP/1.1" 201 103 "-" "-" "-"

but when i try to subscribe to this channel via browser, its stuck at pending, and no logs appears

ZinkevychSergii avatar Sep 19 '19 14:09 ZinkevychSergii

This problem persists as of nchan 1.2.7 + nginx 1.19.9. We see all messages(via redis cli subscribe) being published to the redis(-cluster) successfully, yet another client subscribed to the channel doesn't receive them after a while. Restarting the nginx fixes the issue temporarily, but it again manifests within <30 minutes.

yerkanian avatar May 21 '21 09:05 yerkanian