chopsticks icon indicating copy to clipboard operation
chopsticks copied to clipboard

Prefetched DBs are not the same when using npx and setupNetworks

Open x3c41a opened this issue 9 months ago • 31 comments

There is a difference in behaviour of npx @acala-network/chopsticks@latest fetch-storages '0x' --endpoint=wss://polkadot.rpc.permanence.io --config=configs/polkadot.yml

and doing same thing using setupNetworks function:

const {polkadot, assetHub} = await setupNetworks({
        polkadot: {
            endpoint: process.env.POLKADOT_ENDPOINT || 'ws://localhost:9944',
            'wasm-override': 'runtime_wasm/polkadot_runtime.compact.compressed.wasm',
            'prefetch-storages': ['0x'],
            db: './dbs/polkadot.sqlite',
            port: 8000,
        },
    },
});

From my understanding the behaviour should be the same but first db is around 2 GB while second one - 140 MB. I was not able to get to the root cause of the problem thus posting it here.

x3c41a avatar Mar 20 '25 15:03 x3c41a

They should be the same. You should be able to use some sqlite viewer to view the db and check the content. You can check the KeyValuePair table to count number of rows for the given blockhash and see if the second one is missing data.

I don't think the state should take 2gb so it is possible the first db contain some other data?

xlc avatar Mar 20 '25 21:03 xlc

The 2Gb one is generated fresh (sqlite doesn't exist previously) from:

npx @acala-network/chopsticks@latest fetch-storages '0x' --endpoint=ws://localhost:9944

9944 has a freshly synced warp node spun up with:

    polkadot-omni-node --chain asset-hub-polkadot.json --sync "warp" --database paritydb --blocks-pruning 600 --state-pruning 600 --no-hardware-benchmarks --rpc-max-request-size 1000000 --rpc-max-response-size 1000000 --rpc-port 9945 -- --sync "warp" --database paritydb --blocks-pruning 600 --state-pruning 600 --no-hardware-benchmarks --rpc-max-request-size 1000000 --rpc-max-response-size 1000000 --rpc-port 9944

Is it downloading previous state too? Or something to do with paritydb? Not sure what else it would be.

I'll try with rocksdb and see if it makes a difference in size - @x3c41a which db was your local node running with?

seadanda avatar Mar 21 '25 16:03 seadanda

Nope, local warp synced node with rocksdb is also 2G

Running on polkadot-rpc.dwellir.com is still running slowly but is already past 1.6G

SELECT * FROM key_value_pair GROUP BY "blockHash"; on the 2G db gives me one block hash which tells me that it's just pulling the most recent state. The DB has about 2.5mil KV pairs - looks like 2G is right

seadanda avatar Mar 21 '25 17:03 seadanda

How the node is synced shouldn't matter (otherwise it will be a bug in substrate)

To confirm my understanding, fetch-storages CLI is working and 'prefetch-storages': ['0x'], to setupNetworks isn't?

I suspect the difference is that the download takes a while, and the script that's using setupNetworks exit before everything is downloaded. setupNetworks doesn't offer any API for you to wait for the fetch to be completed. So you should try to call this instead https://github.com/AcalaNetwork/chopsticks/blob/462fdc458f424e4cb68aeeb42825cc79299f9ca8/packages/chopsticks/src/utils/fetch-storages.ts#L126

xlc avatar Mar 24 '25 02:03 xlc

I'll try with rocksdb and see if it makes a difference in size - @x3c41a which db was your local node running with?

I was not using local node whille pre-fetching the storage, instead I used wss://polkadot-rpc.dwellir.com endpoint. Let me double check the resulting db with a local node set-up. I'll use the same omni-node command for spinning up the network.

x3c41a avatar Mar 24 '25 12:03 x3c41a

So I:

  • spun up a local network using the command above (with paritydb),
  • generated pre-fetched db with npx @acala-network/chopsticks@latest fetch-storages '0x' --endpoint=ws://localhost:9944 --config=configs/polkadot.yml
  • and tried to fetch all RC account's keys using the following script:
    const {polkadot, assetHub} = await setupNetworks({
        polkadot: {
            endpoint: process.env.POLKADOT_ENDPOINT || 'ws://localhost:9944',
            'wasm-override': 'runtime_wasm/polkadot_runtime.compact.compressed.wasm',
            db: './dbs/polkadot.sqlite',
            port: 8000,
        },
        assetHub: {
            endpoint: process.env.POLKADOT_ASSET_HUB_ENDPOINT || 'ws://localhost:9945',
            'wasm-override': 'runtime_wasm/asset_hub_polkadot_runtime.compact.compressed.wasm',
            db: './dbs/polkadot-asset-hub.sqlite',
            port: 8001,
        },
    });

    let sysEntries = await polkadot.api.query.system.account.entries()
    for (const [k,v] of sysEntries) {
        console.log("key: ", k)
        console.log("value: ", v)
    }

and it was fetching and fetching the storage til the following error occurred:

2025-03-24 14:54:51        RPC-CORE: queryStorageAt(keys: Vec<StorageKey>, at?: BlockHash): Vec<StorageChangeSet>:: No response received from RPC endpoint in 60s

I enabled LOG_LEVEL=trace and saw the logs:

[14:47:17.620] TRACE (layer): RemoteStorageLayer getKeysPaged
    app: "chopsticks"
    at: "0x94258d8069f740a4d1780b6cd600436f3f9d1c547a2cf77cb9910128346ccd90"
    prefix: "0x0f6738a0ee80c8e74cd2c7417c1e25569613e9bbc07e304aa9a1af9b85898e5a"
    pageSize: 1
    startKey: "0x0f6738a0ee80c8e74cd2c7417c1e25569613e9bbc07e304aa9a1af9b85898e5a1d99db4852f6327367cecc483e14f392bf030000"
[14:47:17.620] TRACE (layer): RemoteStorageLayer get
    app: "chopsticks"
    at: "0x94258d8069f740a4d1780b6cd600436f3f9d1c547a2cf77cb9910128346ccd90"

which suggest that chopsticks were trying to fetch account keys from my local node, not from the db argument that I provided.

My conclusion is that either pre-fetched storage, the one that we consider "the right one", does not have pre-fetched keys or I misconfigured my script (though it looks reasonable).

I'll try to experiment with both: node's --db type (I'll try rocksdb) and pre-fetch prefix (I'll use "System." as an argument) and see what happens. I'll post my findings here, stay tuned.

x3c41a avatar Mar 24 '25 15:03 x3c41a

Same thing happens when the db is populated with this command: npx @acala-network/chopsticks@latest fetch-storages Balances System --endpoint=ws://localhost:9944 --config=configs/polkadot.yml

I'll try rocksdb now UPD: same problem

x3c41a avatar Mar 24 '25 15:03 x3c41a

I was thinking about potential ways to understand why chopsticks does not use pre-fetched keys and quires the db. The problem is either with [0] db prefetch -- it might download wrong values or incomplete set of it, or the problem is with [1] chopsticks implementation that ignores existing keys and fetches same keys again.

If it comes to the former ([0]) problem I have the following set of dbs right now:

  1. pre-fetched storage for System.Accounts pallet alone
  2. pre-fetched storage for Balances and System pallets
  3. pre-fetched storage with prefix '0x'
  4. pre-fetched storage with prefix '0x' + a part of the keys that chopsticks fetched before RPC endpoint stopped responding

I can also pre-fetch any set of pallets needed for testing but let's think about information we can extract from the existing one.

Idea

If I substract the keys_values of №4 from №3 I'll get the keys that chopsticks fetched during the script execution. If I compare the result of this with №1, I'll get to know if chopsticks are trying to fetch the same keys that are in System pallet.

Execution

To substract №4 from №3 I created an empty sqlite db under chopsticks-db folder, opened №4 DB and executed the following SQL commands:

  • attached the path to both №3 db (or 0x) and the resulting db with:
ATTACH DATABASE '/Users/ndk/parity/ahm-dryrun/0x-dbs/polkadot.sqlite' AS db_0x; 
ATTACH DATABASE '/Users/ndk/parity/ahm-dryrun/chopsticks-db/polkadot.sqlite' AS db_chopsticks;
  • inserted the diff between two into the resulting one:
INSERT INTO db_chopsticks.key_value_pair (blockHash, key, value)
   SELECT k2.blockHash, k2.key, k2.value
   FROM key_value_pair k2
   LEFT JOIN db_0x.key_value_pair k1
   ON k2.blockHash = k1.blockHash AND k2.key = k1.key
   WHERE k1.blockHash IS NULL;
  • checked if diff (`SELECT COUNT(*) FROM key_value_pair) between the two equals to the resulting db -- yes, it does.

To verify if the resulting db contains the same keys as №1 I:

  • entered №1 db
  • attached the resulting db ATTACH DATABASE '/Users/ndk/parity/ahm-dryrun/chopsticks-db/polkadot.sqlite' AS db_chopsticks;
SELECT COUNT(*)
   FROM db_chopsticks.key_value_pair k1
   LEFT JOIN key_value_pair k2
   ON k1.blockHash = k2.blockHash AND k1.key = k2.key
   WHERE k2.blockHash IS NULL;
  • the result was the number of entries in the resulting db which means that none of the resulting db's keys are present in System.Account db.

Conclusion

After making all the computation above I think that:

  • either pre-fetch System.Account fetches wrong data
  • or the script:
let sysEntries = await polkadot.api.query.system.account.entries()
    for (const [k,v] of sysEntries) {
        console.log("key: ", k)
        console.log("value: ", v)
    }

is written badly/wrong and tries to access keys other than System.Account,

  • or there is a bug in chopsticks implementation.

Verify if that's the script or pre-fetch

I can create a new db by fetching a single account using the existing 0x pre-fetched db and check the diff between the two:

  • I copied 0x pre-fetched db
  • used this script:
const {polkadot, assetHub} = await setupNetworks({
        polkadot: {
            endpoint: process.env.POLKADOT_ENDPOINT || 'ws://localhost:9944',
            'wasm-override': 'runtime_wasm/polkadot_runtime.compact.compressed.wasm',
            db: './dbs/polkadot.sqlite',
            port: 8000,
        },
});
let sysEntries = await polkadot.api.query.system.account('5Ee7mSUN7p9YGqzthB1uCbQPMo9zC2Z2Yv5b2nsHKDzmtseR')
console.log(sysEntries);
  • It generated another sqlite3 db, which I moved into 0x-plus-one-account folder
  • I opened that new db: sqlite3 0x-plus-one-account/polkadot.sqlite
  • attached raw 0x db: ATTACH DATABASE '/Users/ndk/parity/ahm-dryrun/0x-dbs/polkadot.sqlite' AS db_0x_db;
  • ATTACH DATABASE '/Users/ndk/parity/ahm-dryrun/one-account-db/polkadot.sqlite' AS db_result;
  • inserted the diff between the two:
sqlite> INSERT INTO db_result.key_value_pair (blockHash, key, value)
   ...> SELECT k1.blockHash, k1.key, k1.value
   ...> FROM key_value_pair k1
   ...> LEFT JOIN db_0x_db.key_value_pair k2
   ...> ON k1.blockHash = k2.blockHash AND k1.key = k2.key
   ...> WHERE k2.blockHash IS NULL;
  • the resulting db contains 3315 rows. I suspect that's because of chopsticks' pagination?..
  • I enter system-accounts db: sqlite3 system-accounts-db/polkadot.sqlite
  • attach newly created dbATTACH DATABASE '/Users/ndk/parity/ahm-dryrun/one-account-db/polkadot.sqlite' AS db_result;
  • find the Keys from the result db that are NOT present in system-accounts db:
sqlite> SELECT COUNT(*)
   ...> FROM db_result.key_value_pair r
   ...> LEFT JOIN key_value_pair s
   ...> ON r.blockHash = s.blockHash AND r.key = s.key
   ...> WHERE s.blockHash IS NULL;
  • output 3315

Final conclusion

I verified both: single account and all the accounts entries fetching using chopsticks. I believe I wrote the script right. @xlc, LMK if I did something wrong. None of the keys fetched by chopsticks is present in the pre-fetched System.Account db. Maybe I need to pre-fetch another database, e.g. just System. If that's not the case, then I believe chopsticks do something strange while fetching the storage, unless I did a silly mistake along the way...

@xlc @seadanda eager to hear your thoughts on this!

x3c41a avatar Mar 24 '25 18:03 x3c41a

Thanks for the investigation. I will try to debug this. Also note that you must specify a block hash for the pre fetch db to work. The values are stored under $blockHash-$key. For your fetching script, you should also having the script to fetch keys for a particular block instead of latest head, which is moving and might have key changes and causing unexpected result.

xlc avatar Mar 24 '25 21:03 xlc

I tried with an internal testnet with chopsticks fetch-storages '0x' --endpoint ws://localhost:9944 --block 618266 and chopsticks --config configs/test.yml with

endpoint: ws://localhost:9944
mock-signature-host: true
block: 618266
db: ./roc-test.sqlite
prefetch-storages:
  - 0x

and they both yield a same result

xlc avatar Mar 25 '25 01:03 xlc

The problem is a bit different. When I

  • pre-fetch storage with one of the commands above, let's say: npx @acala-network/chopsticks@latest --config=configs/polkadot.yml and config:
endpoint: ws://localhost:9944
mock-signature-host: true
block: 25284232
db: ./dbs/polkadot.sqlite
wasm-override: ./runtime_wasm/polkadot_runtime.compact.compressed.wasm
prefetch-storages:
  - 0x

^ not sure if wasm-override can cause any issues, I guess not.

  • and try to re-use the pre-fetched db within the following script (to list all the system account keys):
import { test } from "bun:test";
import { setupNetworks } from '@acala-network/chopsticks-testing'

test('test account fetching', async() => {
    const { polkadot } = await setupNetworks({
        polkadot: {
            endpoint: 'ws://localhost:9944',
            'mock-signature-host': true,
            block: 25284232,
            'wasm-override': './runtime_wasm/polkadot_runtime.compact.compressed.wasm',
            db: './dbs/polkadot.sqlite',
            port: 8000,
        },
    });

    let sysEntries = await polkadot.api.query.system.account.entries();
    for (const [k,v] of sysEntries) {
        console.log("key: ", k)
    }
});

chopsticks start fetching the keys once again, i.e. it does not re-use the pre-fetched db and eventually fails with:

2025-03-25 06:55:26        RPC-CORE: getKeysPaged(key: StorageKey, count: u32, startKey?: StorageKey, at?: BlockHash): Vec<StorageKey>:: No response received from RPC endpoint in 60s

@xlc could you try to execute a script that fetches all the system account keys (like the one that I pasted) and see if it re-uses pre-fetched db or not?

x3c41a avatar Mar 25 '25 06:03 x3c41a

DB is used only to get the value and not key ordering. getKeysPaged will hit the RPC.

ermalkaleci avatar Mar 25 '25 09:03 ermalkaleci

Thanks for the input @ermalkaleci ! Could you elaborate on key ordering? I am not sure if I get this right. maybe some code reference? Is there a workaround to make chopsticks read values from the pre-fetched db, or disable RPC hitting?

x3c41a avatar Mar 25 '25 11:03 x3c41a

Unfortunately no, Chopsticks needs an RPC or genesis to run. The db is a caching layer so you don't hit RPC to read state all the time. Chopsticks cannot know how to find next key so it uses getKeysPaged that why you will see logs about getKeysPaged and RPC calls. If received keys are already in db (cache) then it will not call RPC for key-value. So prefetching state will help with reducing RPC calls to fetch every key-value but again it's a caching layer, if you ask for a key that doesn't exist it will end up calling RPC and try to find the value.

ermalkaleci avatar Mar 25 '25 11:03 ermalkaleci

If you prefetch state for block A then you need to run chopsticks starting with block A otherwise what you downloaded it's useless

ermalkaleci avatar Mar 25 '25 11:03 ermalkaleci

I don't see any block specified in previous comments so if you run those scripts multiple times you will end up fetching state for different blocks, that might be one explanation why you get different db size

ermalkaleci avatar Mar 25 '25 11:03 ermalkaleci

I don't see any block specified in previous comments so if you run those scripts multiple times you will end up fetching state for different blocks, that might be one explanation why you get different db size

I do have it specified in both: the script:

polkadot: {
            endpoint: 'ws://localhost:9944',
            'mock-signature-host': true,
            block: 25284232, <----------------------- HERE
            'wasm-override': './runtime_wasm/polkadot_runtime.compact.compressed.wasm',
            db: './dbs/polkadot.sqlite',
            port: 8000,
        },

and within the yaml config:

endpoint: ws://localhost:9944
mock-signature-host: true
block: 25284232 <----------------------- HERE
db: ./dbs/polkadot.sqlite

x3c41a avatar Mar 25 '25 11:03 x3c41a

let sysEntries = await polkadot.api.query.system.account.entries();

this here will get all keys from RPC and this looks like RPC rate limiting disconnecting you.

ermalkaleci avatar Mar 25 '25 11:03 ermalkaleci

endpoint: 'ws://localhost:9944',

are you running node locally?

ermalkaleci avatar Mar 25 '25 11:03 ermalkaleci

let sysEntries = await polkadot.api.query.system.account.entries();

this here will get all keys from RPC and this looks like RPC rate limiting disconnecting you.

Correct my logic:

  • I pre-fetch the state at block 25284232,
  • I set-up network at block 25284232
  • I start fetching all the system account keys from. I suppose that cached layer already contains all the keys at this block, doesn't it?

are you running node locally?

Yes

x3c41a avatar Mar 25 '25 11:03 x3c41a

Chopsticks uses db to cache the values, finding next key will always hit RPC

ermalkaleci avatar Mar 25 '25 11:03 ermalkaleci

Chopsticks uses db to cache the values, finding next key will always hit RPC

Don't I find all the keys during the pre-fetch? I am trying to understand what values it caches and what's "finding next key" means?

If received keys are already in db (cache) then it will not call RPC for key-value. So prefetching state will help with reducing RPC calls to fetch every key-value but again it's a caching layer, if you ask for a key that doesn't exist it will end up calling RPC and try to find the value.

I thought pre-fetch '0x' pre-fetches all the keys, including system.account keys so I don't have to hit RPC node at all, unless new accounts are created meanwhile but their number should be negligible...

x3c41a avatar Mar 25 '25 11:03 x3c41a

Unfortunately no, Chopsticks needs an RPC or genesis to run. The db is a caching layer so you don't hit RPC to read state all the time. Chopsticks cannot know how to find next key so it uses getKeysPaged that why you will see logs about getKeysPaged and RPC calls. If received keys are already in db (cache) then it will not call RPC for key-value. So prefetching state will help with reducing RPC calls to fetch every key-value but again it's a caching layer, if you ask for a key that doesn't exist it will end up calling RPC and try to find the value.

Maybe other way around. My goal is to run account migration from Relay Chain to Asset Hub.

My idea was:

  1. Pre-fetch the storage for the Relay (using '0x' prefix),
  2. Override the wasm,
  3. Save all the account keys in memory. Here I supposed they would be taken from the pre-fetched db, since I did not create any new block and everything was already pre-fetched in the db.
  4. Run the migration. The migration iterates over the existing accounts, withdraws the funds and sends them over to AH using XCM.
  5. Reads Asset Hub's and Relay's storage. Here the values might not be pre-fetched and RPC calls will be needed either way.

So anyways I'll have a problem with fetching all the account keys, either in step 3 or in step 5. It means that I need a reliable way of fetching all the key-values from an endpoint (either local or remote), which brings me back to the problem of

2025-03-25 06:55:26        RPC-CORE: getKeysPaged(key: StorageKey, count: u32, startKey?: StorageKey, at?: BlockHash): Vec<StorageKey>:: No response received from RPC endpoint in 60s

The error comes from this function and I am wondering if that's client or server side error. While I am getting familiar with the code, @ermalkaleci maybe you know where rpc timeout parameter is set? I guess it uses a default value...

x3c41a avatar Mar 25 '25 12:03 x3c41a

60s is high enough.

Why do you need to test all accounts? Why don't you override system.account with some dummy and test with them?

ermalkaleci avatar Mar 25 '25 19:03 ermalkaleci

I need to have automated tests for running Asset Hub migration which touches all (almost) the Relay Chain accounts. Thus I want to override a wasm and emulate the real migration to see what happens along the way and compare chain (RC and AH) states before and after the migration

x3c41a avatar Mar 25 '25 20:03 x3c41a

It is real, just with a fraction of accounts. You can query some accounts and put them on storage override and run your migration. It will be enough to test the migration, I don't see why you need to do it with all accounts.

ermalkaleci avatar Mar 25 '25 20:03 ermalkaleci

That's an option too! However, when I was running the migration using plenty of accounts (without fetching their state) I found two bugs related to overweight. We aim to test and dry-run the migration in an environment that closely mirrors the real one.

x3c41a avatar Mar 25 '25 20:03 x3c41a

If increasing timeout is an issue then how about retrying transient errors for a few times?

x3c41a avatar Mar 25 '25 20:03 x3c41a

I don't think the timeout is the issue here. RPC should respond really fast especially if you're running it locally unless there's some error with polkadot/api

ermalkaleci avatar Mar 25 '25 21:03 ermalkaleci

I just figured out the prefetch-storages feature is useless, chopsticks will do prefetch storage for all keys it receives

ermalkaleci avatar Mar 26 '25 08:03 ermalkaleci