cronos icon indicating copy to clipboard operation
cronos copied to clipboard

v1.0.2->v1.04 Websockets issue remain open + new unstable websocket bug in official binaries

Open CRossel87a opened this issue 2 years ago • 7 comments

Hi,

There are currently open two issues I've identified with Cronos. The first is the failed start of subscriptions in self-compiled binaries. See issue A below. I also maintain and work on other Ethermint-based nodes such as Evmos and Canto, and do not have these issues. Therefore I suspect the problem is somewhere in the build configuration or the forked branches. All three nodes are under heavy load with multiple subscriptions.

When using official Cronos binaries, the websocket subscription is unstable and will stop pushing notifications after a few hours but will maintain the open connection. Forced reconnect will accept subscription but not push notifications after this point.

Steps to reproduce: A) Compile git clone https://github.com/crypto-org-chain/cronos.git git checkout tags/v1.0.2 (or 3 or 4) COSMOS_BUILD_OPTIONS=rocksdb make install make build

Start node, sync up, wscat -c ws://127.0.0.1:8546 {"jsonrpc":"2.0","id":1,"method":"eth_subscribe","params":["newHeads"]}

Subscriptions of all kinds across websockets are confirming subscription, but not responding with data

Regular json http rpc calls will work

CRossel87a avatar Feb 17 '23 18:02 CRossel87a

Hi,

Just a reminder this problem is still present

CRossel87a avatar Apr 03 '23 11:04 CRossel87a

I experienced the same issue a while back. I stopped running Cronos nodes at that point because the effort didn't seem worth it, so I can't speak about the latest versions, just that this bug has been present for at least 9 months. If someone can track this down, I'll give it another try.

kaber2 avatar Apr 26 '23 19:04 kaber2

@CRossel87a Seems work normally in mainnet, may I ask if you try with 1.0.7 before?

websocat ws://127.0.0.1:8546
{"jsonrpc":"2.0","id":1,"method":"eth_subscribe","params":["newHeads"]}
{"jsonrpc":"2.0","result":"0xe1843efa56ed3b302452fbc7cbeb52b9","id":1}
{"jsonrpc":"2.0","method":"eth_subscription","params":{"subscription":"0xe1843efa56ed3b302452fbc7cbeb52b9","result":{"parentHash":"0xe8caf282971db1bae5671bf99133730a2b2453e647a1d2bb39ed020c70296d2f","sha3Uncles":"0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347","miner":"0x473e39d1641cae3844a834e69d868a21dbcccd6e","stateRoot":"0xa8347ced8eb44869075842ebc425bf946346f5567d861287c5500aec23f4dc83","transactionsRoot":"0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421","receiptsRoot":"0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421","logsBloom":"0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000","difficulty":"0x0","number":"0x7aeb00","gasLimit":"0x0","gasUsed":"0x0","timestamp":"0x6449e467","extraData":"0x","mixHash":"0x0000000000000000000000000000000000000000000000000000000000000000","nonce":"0x0000000000000000","baseFeePerGas":"0x3b9aca00","hash":"0x885774eaf00a227847693e4dabc19e64c10c62076e50063aac70daf9d6e72869"}}}

mmsqe avatar Apr 27 '23 03:04 mmsqe

I am running with v1.0.6 and having the same issue. Evmos team solved the issue in their latest version for their client, however I see that Cronos is still running an older version of Ethermint in v1.0.7.

CRossel87a avatar Apr 28 '23 10:04 CRossel87a

So after almost a year, I thought I'll give it another try. Websocket is just as unstable as ever. After maybe 10-20 blocks received (newHeads), I get:

1:50PM ERR Failed to read request err="websocket: close 1006 (abnormal closure): unexpected EOF" module=rpc-server protocol=websocket remote={"Name":"@","Net":"unix"} server=node

At which point the subscription goes silent.

Seriously, does anyone use this? For at least a year now, event subscriptions fail pretty much immediately with no way to recover. The server has massive power and all IPC is local, so this is definitely not related to a slow RPC consumer or something like that.

kaber2 avatar Jun 20 '23 11:06 kaber2

So after almost a year, I thought I'll give it another try. Websocket is just as unstable as ever. After maybe 10-20 blocks received (newHeads), I get:

1:50PM ERR Failed to read request err="websocket: close 1006 (abnormal closure): unexpected EOF" module=rpc-server protocol=websocket remote={"Name":"@","Net":"unix"} server=node

At which point the subscription goes silent.

Seriously, does anyone use this? For at least a year now, event subscriptions fail pretty much immediately with no way to recover. The server has massive power and all IPC is local, so this is definitely not related to a slow RPC consumer or something like that.

We found the bug. It was in filter_system.go in Ethermint: https://github.com/evmos/ethermint/pull/1773 You're crashing the rpc interface when you subscribe to more than one topic log at once. I have manually patched my local copy of Cronos and it is now working fine. However, the Cronos team has not implemented it yet.

@mmsqe

CRossel87a avatar Jun 20 '23 12:06 CRossel87a

@kaber2 sorry that we haven't release v1.0.10 yet, which will contains the backported bug fixes

mmsqe avatar Jun 20 '23 12:06 mmsqe

fixed by https://github.com/crypto-org-chain/cronos/pull/1239

mmsqe avatar Jul 05 '24 08:07 mmsqe