chopsticks icon indicating copy to clipboard operation
chopsticks copied to clipboard

Timeout from setupNetworks is not propagated

Open x3c41a opened this issue 9 months ago • 10 comments

I am setting up polkadot network and overriding ws timeout with the following script:

import { test } from "bun:test";
import { setupNetworks } from '@acala-network/chopsticks-testing'

test('increased time-out', async() => {
    const {polkadot} = await setupNetworks({
        polkadot: {
            endpoint: 'ws://localhost:9944',
            port: 8000,
            timeout: 300_000_000,
        },
    });

    let sysEntries = await polkadot.api.query.system.account.entries()
    for (const [k,v] of sysEntries) {
        console.log("key: ", k)
    }
})

After that I am trying to fetch System.Account entries and get a

2025-03-25 17:57:46        RPC-CORE: queryStorageAt(keys: Vec<StorageKey>, at?: BlockHash): Vec<StorageChangeSet>:: -32603: Internal Error: No response received from RPC endpoint in 60s

chopsticks' LOG_LEVEL=trace shows additional information

[17:57:46.723] ERROR (ws): Error handling request: 'Error: No response received from RPC endpoint in 60s
    at __internal__timeoutHandlers (/Users/ndk/parity/ahm-dryrun/node_modules/@polkadot/rpc-provider/ws/index.js:503:42)'
    app: "chopsticks"

I checked @polkadot/rpc-provider/ws implementation and 60s is the default value. I double checked and the timeout that I overwrote was actually applied: console.log(polkadot.ws); output:

WsProvider {
...
 __internal__timeout: 300000000,
...
}

I started digging deeper and got to the point where provider for fetching storage is created - link. It does not use the value that I set in setupNetworks config.

How can I increase RPC timeout for fetching storage? It fails systematically so maybe it'll be better to implement an exponential back-off or similar retrying technique? @xlc @ermalkaleci

x3c41a avatar Mar 25 '25 19:03 x3c41a

Firstly, the timeout is to the chopsticks instance, not to the upstream RPC provider.

Secondly, I will say this is a XY problem

Increase the timeout is unlikely going to help. The root cause is most likely that Chopsticks is making too many RPC requests the node can handle and it ends up being unresponsive. The real root cause is that the Substrate node shouldn't be unresponsive at all. If it is truly overloaded and cannot handle the request, it should respond some error code or reject the connection. I opened an issue here https://github.com/paritytech/polkadot-sdk/issues/8035

On the Chopsticks side, we could try to reduce the batch size and maybe that can help a bit https://github.com/AcalaNetwork/chopsticks/blob/462fdc458f424e4cb68aeeb42825cc79299f9ca8/packages/chopsticks/src/utils/fetch-storages.ts#L17

Can you reduce the batch size and see if it can avoid the RPC timeout issue?

xlc avatar Mar 25 '25 22:03 xlc

@xlc I think it is exceeding response size

ermalkaleci avatar Mar 26 '25 08:03 ermalkaleci

I forked chopsticks repo and manually linked it to my script inside package.json. However, when I try to run my script with the new chopsticks, the compiler keeps throwing errors or keep being unable to find modules, e.g.

error: error: Cannot find module '@acala-network/chopsticks-executor' from '/Users/ndk/parity/x3c41a/chopsticks/packages/core/src/wasm-executor/node-wasm-executor.js'
      at emitError (node:worker_threads:205:13)

Note that I have a pretty limited experience with js/ts. @ermalkaleci have you checked batch sizes or could you, please, help me with that if you ever did this before?

x3c41a avatar Mar 26 '25 10:03 x3c41a

you need to run yarn build-wasm and maybe yarn build you can also just add (or modify an existing) unit test in this repo to run some testing code

xlc avatar Mar 26 '25 10:03 xlc

I tried reducing BATCH_SIZE and fetching the keys - same problem.

I tried tweaking local node and adding flags from the most recent changes, see - https://github.com/paritytech/polkadot-sdk/pull/7994. Still getting the same error, none of these helped:

  1. adding --rpc-rate-limit 12345678 100000000 for both omni-node and regular polkadot node. Note: if you don't specify --rpc-rate-limit X the node doesn't enable any rate-limiting at all. Commands for reference below:
./target/release/polkadot-omni-node --chain /Users/ndk/parity/polkadot-sdk/cumulus/polkadot-parachain/chain-specs/asset-hub-polkadot.json --sync "warp" --database rocksdb  --blocks-pruning 600 --state-pruning 600 --no-hardware-benchmarks --rpc-max-request-size 100000000 --rpc-max-response-size 100000000 --rpc-port 9945 --rpc-rate-limit 12345678 -- --sync "warp" --database rocksdb --blocks-pruning 600 --state-pruning 600 --no-hardware-benchmarks --rpc-max-request-size 100000000 --rpc-max-response-size 100000000 --rpc-port 9944 --rpc-rate-limit 12345678
  • polkadot --sync warp --state-pruning 1000 --blocks-pruning 1000 --rpc-rate-limit 12345678 --tmp --rpc-port 9944 --rpc-cors all
  1. adding --rpc-message-buffer-capacity-per-connection 4294967295 to polkadot node
  2. I also ran BATCH_SIZE of size 1000 (current one), 400, 200 and 100 against all the cases mentioned above and none of them worked.

@xlc do you have any other idea how RPC connection issue might be fixed. I was still seeing responses of lenght 1000 even when I changed BATCH_SIZE but I believe unchanged response was due to pageSize.

x3c41a avatar Mar 26 '25 17:03 x3c41a

https://github.com/AcalaNetwork/chopsticks/pull/898 this should help

will try to reproduce the issue locally and see what's exactly is causing the problem

xlc avatar Mar 27 '25 03:03 xlc

@xlc did you have time to reproduce it?

x3c41a avatar Mar 27 '25 19:03 x3c41a

this is what I did and all working good without any issues:

run polkadot node in polkadot-sdk repo cargo run --release -p polkadot -- --chain=polkadot --sync warp --no-hardware-benchmarks --rpc-port 9944 --rpc-max-response-size 100000000

fetch storage: yarn start fetch-storages '0x' --db polkadot.sqlite --endpoint ws://localhost:9944 --block 25323705

I am getting a 2GB db

await polkadot.api.query.system.account.entries() this code is reading all the accounts into memory and that is going to take forever regardless.

I am running this code and it is working fine

onst { polkadot } = await setupNetworks({
    polkadot: {
      endpoint: 'ws://localhost:9944',
      block: 25323705,
      db: 'polkadot.sqlite',
      port: 8000,
      'build-block-mode': BuildBlockMode.Manual,
    }
  })

  console.log('setup')

  let startKey = '0x'
  while (true) {
    const sysEntries = await polkadot.api.query.system.account.entriesPaged({ pageSize: 1000, args: [], startKey })
    for (const [k] of sysEntries) {
        console.log("key: ", k.toHuman())
        startKey = k.toHex()
    }
    if (sysEntries.length < 1000) {
      break
    }
  }

I cannot reproduce any timeout issue

xlc avatar Mar 28 '25 00:03 xlc

dumb question:

why is this going to take forverer?

await polkadot.api.query.system.account.entries() this code is reading all the accounts into memory and that is going to take forever regardless.

I did the same with the smaller dataset of rcAccounts and it worked perfectly fine

x3c41a avatar Mar 28 '25 05:03 x3c41a

just because there are a lot of accounts on polkadot

xlc avatar Mar 28 '25 05:03 xlc