cronos icon indicating copy to clipboard operation
cronos copied to clipboard

can‘t connect to ws server in v0.7.0

Open huahuayu opened this issue 2 years ago • 6 comments

Env

cronos: 0.6.5 and 0.7.0

Issue

Can't subscribe newHead and newTxs:

SubscribeNewHeads / newTxs from golang

dial tcp <my_ip>:8546: connect: connection reset by peer ws://<my_ip>:8546

Subscribe from wscat

wscat -c ws://<my_ip>:8546
error: connect ECONNRESET <my_ip>:8546

Behavior in v0.6.5

You can connect to ws after restarting, it works only 5-10 mins, then you get error and need another restart.

Behavior in v0.7.0

Today I upgrade to v0.7.0, the longest record I have is about 1 hour, within that 1 hour, I can subscribe newHead and newTx, I test it many times, so I thought it is been fixed in v0.7.0.

But when I tried again just now, I can't connect ws server anymore. Even restarting cronosd is useless.

The issue is still there.

@yihuang Please help to check, thx.

huahuayu avatar May 21 '22 08:05 huahuayu

What's the problem do you think, I am willing to dig into the issue, please share your findings.

huahuayu avatar May 21 '22 08:05 huahuayu

My findings

In tendermint/state/txindex/indexer_service.go use unbuffered channel for blockHead and tx subscribe, which may block the channel

	blockHeadersSub, err := is.eventBus.SubscribeUnbuffered(
		context.Background(),
		subscriber,
		types.EventQueryNewBlockHeader)
	if err != nil {
		return err
	}

	txsSub, err := is.eventBus.SubscribeUnbuffered(context.Background(), subscriber, types.EventQueryTx)
	if err != nil {
		return err
	}

In tendermint/libs/pubsub/pubsub.go I do observed send event msg get blocked, the logic goes to --> mark.

func (state *state) send(msg interface{}, events map[string][]string) error {
	for qStr, clientSubscriptions := range state.subscriptions {
		q := state.queries[qStr].q

		match, err := q.Matches(events)
		if err != nil {
			return fmt.Errorf("failed to match against query %s: %w", q.String(), err)
		}

		if match {
			for clientID, subscription := range clientSubscriptions {
				if cap(subscription.out) == 0 {
					// block on unbuffered channel
-->					subscription.out <- NewMessage(msg, events)
				} else {
					// don't block on buffered channels
					select {
					case subscription.out <- NewMessage(msg, events):
					default:
						state.remove(clientID, qStr, ErrOutOfCapacity)
					}
				}
			}
		}
	}

	return nil
}

Solution

I changed blockHeadersSub, err := is.eventBus.SubscribeUnbuffered and txsSub, err := is.eventBus.SubscribeUnbuffered to buffered channel, so far so good, let me keep observing for a while.

huahuayu avatar May 22 '22 07:05 huahuayu

Three days passed, still works. @yihuang

huahuayu avatar May 24 '22 08:05 huahuayu

Three days passed, still works. @yihuang

awsome, so the issue is dead lock on unbuffered channel? Can you open a PR to tendermint directly?

yihuang avatar May 24 '22 08:05 yihuang

I don't know how to effectively reproduce the issue and am not sure if there are other side effects.

huahuayu avatar May 24 '22 11:05 huahuayu

Hi @huahuayu, I think the block on unbuffered channel was designed for the indexer services in Tendermint, it guaranteed that every event will be processed to the indexer. If the indexer has a heavy I/O loading, it will blocks the pubsub module temporarily for sure.

What's your experimental_websocket_write_buffer_size and experimental_subscription_buffer_size in config.toml? it shouldn't be 0 then you will get a buffered channels subscription.

Do you need to use indexer service from the node? maybe you can set it to null, and to see if this issue still happens.

JayT106 avatar May 24 '22 16:05 JayT106

I think ws server issue will eventually be fixed by this solution:https://github.com/crypto-org-chain/cronos/issues/665

yihuang avatar Sep 27 '22 02:09 yihuang