Prefetched DBs are not the same when using npx and setupNetworks
There is a difference in behaviour of
npx @acala-network/chopsticks@latest fetch-storages '0x' --endpoint=wss://polkadot.rpc.permanence.io --config=configs/polkadot.yml
and doing same thing using setupNetworks function:
const {polkadot, assetHub} = await setupNetworks({
polkadot: {
endpoint: process.env.POLKADOT_ENDPOINT || 'ws://localhost:9944',
'wasm-override': 'runtime_wasm/polkadot_runtime.compact.compressed.wasm',
'prefetch-storages': ['0x'],
db: './dbs/polkadot.sqlite',
port: 8000,
},
},
});
From my understanding the behaviour should be the same but first db is around 2 GB while second one - 140 MB. I was not able to get to the root cause of the problem thus posting it here.
They should be the same. You should be able to use some sqlite viewer to view the db and check the content. You can check the KeyValuePair table to count number of rows for the given blockhash and see if the second one is missing data.
I don't think the state should take 2gb so it is possible the first db contain some other data?
The 2Gb one is generated fresh (sqlite doesn't exist previously) from:
npx @acala-network/chopsticks@latest fetch-storages '0x' --endpoint=ws://localhost:9944
9944 has a freshly synced warp node spun up with:
polkadot-omni-node --chain asset-hub-polkadot.json --sync "warp" --database paritydb --blocks-pruning 600 --state-pruning 600 --no-hardware-benchmarks --rpc-max-request-size 1000000 --rpc-max-response-size 1000000 --rpc-port 9945 -- --sync "warp" --database paritydb --blocks-pruning 600 --state-pruning 600 --no-hardware-benchmarks --rpc-max-request-size 1000000 --rpc-max-response-size 1000000 --rpc-port 9944
Is it downloading previous state too? Or something to do with paritydb? Not sure what else it would be.
I'll try with rocksdb and see if it makes a difference in size - @x3c41a which db was your local node running with?
Nope, local warp synced node with rocksdb is also 2G
Running on polkadot-rpc.dwellir.com is still running slowly but is already past 1.6G
SELECT * FROM key_value_pair GROUP BY "blockHash"; on the 2G db gives me one block hash which tells me that it's just pulling the most recent state. The DB has about 2.5mil KV pairs - looks like 2G is right
How the node is synced shouldn't matter (otherwise it will be a bug in substrate)
To confirm my understanding, fetch-storages CLI is working and 'prefetch-storages': ['0x'], to setupNetworks isn't?
I suspect the difference is that the download takes a while, and the script that's using setupNetworks exit before everything is downloaded. setupNetworks doesn't offer any API for you to wait for the fetch to be completed. So you should try to call this instead https://github.com/AcalaNetwork/chopsticks/blob/462fdc458f424e4cb68aeeb42825cc79299f9ca8/packages/chopsticks/src/utils/fetch-storages.ts#L126
I'll try with rocksdb and see if it makes a difference in size - @x3c41a which db was your local node running with?
I was not using local node whille pre-fetching the storage, instead I used wss://polkadot-rpc.dwellir.com endpoint. Let me double check the resulting db with a local node set-up. I'll use the same omni-node command for spinning up the network.
So I:
- spun up a local network using the command above (with paritydb),
- generated pre-fetched db with
npx @acala-network/chopsticks@latest fetch-storages '0x' --endpoint=ws://localhost:9944 --config=configs/polkadot.yml - and tried to fetch all RC account's keys using the following script:
const {polkadot, assetHub} = await setupNetworks({
polkadot: {
endpoint: process.env.POLKADOT_ENDPOINT || 'ws://localhost:9944',
'wasm-override': 'runtime_wasm/polkadot_runtime.compact.compressed.wasm',
db: './dbs/polkadot.sqlite',
port: 8000,
},
assetHub: {
endpoint: process.env.POLKADOT_ASSET_HUB_ENDPOINT || 'ws://localhost:9945',
'wasm-override': 'runtime_wasm/asset_hub_polkadot_runtime.compact.compressed.wasm',
db: './dbs/polkadot-asset-hub.sqlite',
port: 8001,
},
});
let sysEntries = await polkadot.api.query.system.account.entries()
for (const [k,v] of sysEntries) {
console.log("key: ", k)
console.log("value: ", v)
}
and it was fetching and fetching the storage til the following error occurred:
2025-03-24 14:54:51 RPC-CORE: queryStorageAt(keys: Vec<StorageKey>, at?: BlockHash): Vec<StorageChangeSet>:: No response received from RPC endpoint in 60s
I enabled LOG_LEVEL=trace and saw the logs:
[14:47:17.620] TRACE (layer): RemoteStorageLayer getKeysPaged
app: "chopsticks"
at: "0x94258d8069f740a4d1780b6cd600436f3f9d1c547a2cf77cb9910128346ccd90"
prefix: "0x0f6738a0ee80c8e74cd2c7417c1e25569613e9bbc07e304aa9a1af9b85898e5a"
pageSize: 1
startKey: "0x0f6738a0ee80c8e74cd2c7417c1e25569613e9bbc07e304aa9a1af9b85898e5a1d99db4852f6327367cecc483e14f392bf030000"
[14:47:17.620] TRACE (layer): RemoteStorageLayer get
app: "chopsticks"
at: "0x94258d8069f740a4d1780b6cd600436f3f9d1c547a2cf77cb9910128346ccd90"
which suggest that chopsticks were trying to fetch account keys from my local node, not from the db argument that I provided.
My conclusion is that either pre-fetched storage, the one that we consider "the right one", does not have pre-fetched keys or I misconfigured my script (though it looks reasonable).
I'll try to experiment with both: node's --db type (I'll try rocksdb) and pre-fetch prefix (I'll use "System." as an argument) and see what happens. I'll post my findings here, stay tuned.
Same thing happens when the db is populated with this command:
npx @acala-network/chopsticks@latest fetch-storages Balances System --endpoint=ws://localhost:9944 --config=configs/polkadot.yml
I'll try rocksdb now UPD: same problem
I was thinking about potential ways to understand why chopsticks does not use pre-fetched keys and quires the db. The problem is either with [0] db prefetch -- it might download wrong values or incomplete set of it, or the problem is with [1] chopsticks implementation that ignores existing keys and fetches same keys again.
If it comes to the former ([0]) problem I have the following set of dbs right now:
- pre-fetched storage for System.Accounts pallet alone
- pre-fetched storage for Balances and System pallets
- pre-fetched storage with prefix '0x'
- pre-fetched storage with prefix '0x' + a part of the keys that chopsticks fetched before RPC endpoint stopped responding
I can also pre-fetch any set of pallets needed for testing but let's think about information we can extract from the existing one.
Idea
If I substract the keys_values of №4 from №3 I'll get the keys that chopsticks fetched during the script execution. If I compare the result of this with №1, I'll get to know if chopsticks are trying to fetch the same keys that are in System pallet.
Execution
To substract №4 from №3 I created an empty sqlite db under chopsticks-db folder, opened №4 DB and executed the following SQL commands:
- attached the path to both №3 db (or
0x) and the resulting db with:
ATTACH DATABASE '/Users/ndk/parity/ahm-dryrun/0x-dbs/polkadot.sqlite' AS db_0x;
ATTACH DATABASE '/Users/ndk/parity/ahm-dryrun/chopsticks-db/polkadot.sqlite' AS db_chopsticks;
- inserted the diff between two into the resulting one:
INSERT INTO db_chopsticks.key_value_pair (blockHash, key, value)
SELECT k2.blockHash, k2.key, k2.value
FROM key_value_pair k2
LEFT JOIN db_0x.key_value_pair k1
ON k2.blockHash = k1.blockHash AND k2.key = k1.key
WHERE k1.blockHash IS NULL;
- checked if diff (`SELECT COUNT(*) FROM key_value_pair) between the two equals to the resulting db -- yes, it does.
To verify if the resulting db contains the same keys as №1 I:
- entered №1 db
- attached the resulting db
ATTACH DATABASE '/Users/ndk/parity/ahm-dryrun/chopsticks-db/polkadot.sqlite' AS db_chopsticks;
SELECT COUNT(*)
FROM db_chopsticks.key_value_pair k1
LEFT JOIN key_value_pair k2
ON k1.blockHash = k2.blockHash AND k1.key = k2.key
WHERE k2.blockHash IS NULL;
- the result was the number of entries in the resulting db which means that none of the resulting db's keys are present in System.Account db.
Conclusion
After making all the computation above I think that:
- either
pre-fetch System.Accountfetches wrong data - or the script:
let sysEntries = await polkadot.api.query.system.account.entries()
for (const [k,v] of sysEntries) {
console.log("key: ", k)
console.log("value: ", v)
}
is written badly/wrong and tries to access keys other than System.Account,
- or there is a bug in chopsticks implementation.
Verify if that's the script or pre-fetch
I can create a new db by fetching a single account using the existing 0x pre-fetched db and check the diff between the two:
- I copied
0xpre-fetched db - used this script:
const {polkadot, assetHub} = await setupNetworks({
polkadot: {
endpoint: process.env.POLKADOT_ENDPOINT || 'ws://localhost:9944',
'wasm-override': 'runtime_wasm/polkadot_runtime.compact.compressed.wasm',
db: './dbs/polkadot.sqlite',
port: 8000,
},
});
let sysEntries = await polkadot.api.query.system.account('5Ee7mSUN7p9YGqzthB1uCbQPMo9zC2Z2Yv5b2nsHKDzmtseR')
console.log(sysEntries);
- It generated another sqlite3 db, which I moved into
0x-plus-one-accountfolder - I opened that new db:
sqlite3 0x-plus-one-account/polkadot.sqlite - attached raw
0xdb:ATTACH DATABASE '/Users/ndk/parity/ahm-dryrun/0x-dbs/polkadot.sqlite' AS db_0x_db; ATTACH DATABASE '/Users/ndk/parity/ahm-dryrun/one-account-db/polkadot.sqlite' AS db_result;- inserted the diff between the two:
sqlite> INSERT INTO db_result.key_value_pair (blockHash, key, value)
...> SELECT k1.blockHash, k1.key, k1.value
...> FROM key_value_pair k1
...> LEFT JOIN db_0x_db.key_value_pair k2
...> ON k1.blockHash = k2.blockHash AND k1.key = k2.key
...> WHERE k2.blockHash IS NULL;
- the resulting db contains 3315 rows. I suspect that's because of chopsticks' pagination?..
- I enter system-accounts db:
sqlite3 system-accounts-db/polkadot.sqlite - attach newly created db
ATTACH DATABASE '/Users/ndk/parity/ahm-dryrun/one-account-db/polkadot.sqlite' AS db_result; - find the Keys from the
resultdb that are NOT present in system-accounts db:
sqlite> SELECT COUNT(*)
...> FROM db_result.key_value_pair r
...> LEFT JOIN key_value_pair s
...> ON r.blockHash = s.blockHash AND r.key = s.key
...> WHERE s.blockHash IS NULL;
- output 3315
Final conclusion
I verified both: single account and all the accounts entries fetching using chopsticks. I believe I wrote the script right. @xlc, LMK if I did something wrong.
None of the keys fetched by chopsticks is present in the pre-fetched System.Account db. Maybe I need to pre-fetch another database, e.g. just System. If that's not the case, then I believe chopsticks do something strange while fetching the storage, unless I did a silly mistake along the way...
@xlc @seadanda eager to hear your thoughts on this!
Thanks for the investigation. I will try to debug this.
Also note that you must specify a block hash for the pre fetch db to work. The values are stored under $blockHash-$key. For your fetching script, you should also having the script to fetch keys for a particular block instead of latest head, which is moving and might have key changes and causing unexpected result.
I tried with an internal testnet with
chopsticks fetch-storages '0x' --endpoint ws://localhost:9944 --block 618266
and
chopsticks --config configs/test.yml with
endpoint: ws://localhost:9944
mock-signature-host: true
block: 618266
db: ./roc-test.sqlite
prefetch-storages:
- 0x
and they both yield a same result
The problem is a bit different. When I
- pre-fetch storage with one of the commands above, let's say:
npx @acala-network/chopsticks@latest --config=configs/polkadot.ymland config:
endpoint: ws://localhost:9944
mock-signature-host: true
block: 25284232
db: ./dbs/polkadot.sqlite
wasm-override: ./runtime_wasm/polkadot_runtime.compact.compressed.wasm
prefetch-storages:
- 0x
^ not sure if wasm-override can cause any issues, I guess not.
- and try to re-use the pre-fetched db within the following script (to list all the system account keys):
import { test } from "bun:test";
import { setupNetworks } from '@acala-network/chopsticks-testing'
test('test account fetching', async() => {
const { polkadot } = await setupNetworks({
polkadot: {
endpoint: 'ws://localhost:9944',
'mock-signature-host': true,
block: 25284232,
'wasm-override': './runtime_wasm/polkadot_runtime.compact.compressed.wasm',
db: './dbs/polkadot.sqlite',
port: 8000,
},
});
let sysEntries = await polkadot.api.query.system.account.entries();
for (const [k,v] of sysEntries) {
console.log("key: ", k)
}
});
chopsticks start fetching the keys once again, i.e. it does not re-use the pre-fetched db and eventually fails with:
2025-03-25 06:55:26 RPC-CORE: getKeysPaged(key: StorageKey, count: u32, startKey?: StorageKey, at?: BlockHash): Vec<StorageKey>:: No response received from RPC endpoint in 60s
@xlc could you try to execute a script that fetches all the system account keys (like the one that I pasted) and see if it re-uses pre-fetched db or not?
DB is used only to get the value and not key ordering. getKeysPaged will hit the RPC.
Thanks for the input @ermalkaleci ! Could you elaborate on key ordering? I am not sure if I get this right. maybe some code reference? Is there a workaround to make chopsticks read values from the pre-fetched db, or disable RPC hitting?
Unfortunately no, Chopsticks needs an RPC or genesis to run. The db is a caching layer so you don't hit RPC to read state all the time. Chopsticks cannot know how to find next key so it uses getKeysPaged that why you will see logs about getKeysPaged and RPC calls. If received keys are already in db (cache) then it will not call RPC for key-value. So prefetching state will help with reducing RPC calls to fetch every key-value but again it's a caching layer, if you ask for a key that doesn't exist it will end up calling RPC and try to find the value.
If you prefetch state for block A then you need to run chopsticks starting with block A otherwise what you downloaded it's useless
I don't see any block specified in previous comments so if you run those scripts multiple times you will end up fetching state for different blocks, that might be one explanation why you get different db size
I don't see any block specified in previous comments so if you run those scripts multiple times you will end up fetching state for different blocks, that might be one explanation why you get different db size
I do have it specified in both: the script:
polkadot: {
endpoint: 'ws://localhost:9944',
'mock-signature-host': true,
block: 25284232, <----------------------- HERE
'wasm-override': './runtime_wasm/polkadot_runtime.compact.compressed.wasm',
db: './dbs/polkadot.sqlite',
port: 8000,
},
and within the yaml config:
endpoint: ws://localhost:9944
mock-signature-host: true
block: 25284232 <----------------------- HERE
db: ./dbs/polkadot.sqlite
let sysEntries = await polkadot.api.query.system.account.entries();
this here will get all keys from RPC and this looks like RPC rate limiting disconnecting you.
endpoint: 'ws://localhost:9944',
are you running node locally?
let sysEntries = await polkadot.api.query.system.account.entries();
this here will get all keys from RPC and this looks like RPC rate limiting disconnecting you.
Correct my logic:
- I pre-fetch the state at block 25284232,
- I set-up network at block 25284232
- I start fetching all the system account keys from. I suppose that cached layer already contains all the keys at this block, doesn't it?
are you running node locally?
Yes
Chopsticks uses db to cache the values, finding next key will always hit RPC
Chopsticks uses db to cache the values, finding next key will always hit RPC
Don't I find all the keys during the pre-fetch? I am trying to understand what values it caches and what's "finding next key" means?
If received keys are already in db (cache) then it will not call RPC for key-value. So prefetching state will help with reducing RPC calls to fetch every key-value but again it's a caching layer, if you ask for a key that doesn't exist it will end up calling RPC and try to find the value.
I thought pre-fetch '0x' pre-fetches all the keys, including system.account keys so I don't have to hit RPC node at all, unless new accounts are created meanwhile but their number should be negligible...
Unfortunately no, Chopsticks needs an RPC or genesis to run. The db is a caching layer so you don't hit RPC to read state all the time. Chopsticks cannot know how to find next key so it uses getKeysPaged that why you will see logs about getKeysPaged and RPC calls. If received keys are already in db (cache) then it will not call RPC for key-value. So prefetching state will help with reducing RPC calls to fetch every key-value but again it's a caching layer, if you ask for a key that doesn't exist it will end up calling RPC and try to find the value.
Maybe other way around. My goal is to run account migration from Relay Chain to Asset Hub.
My idea was:
- Pre-fetch the storage for the Relay (using '0x' prefix),
- Override the wasm,
- Save all the account keys in memory. Here I supposed they would be taken from the pre-fetched db, since I did not create any new block and everything was already pre-fetched in the db.
- Run the migration. The migration iterates over the existing accounts, withdraws the funds and sends them over to AH using XCM.
- Reads Asset Hub's and Relay's storage. Here the values might not be pre-fetched and RPC calls will be needed either way.
So anyways I'll have a problem with fetching all the account keys, either in step 3 or in step 5. It means that I need a reliable way of fetching all the key-values from an endpoint (either local or remote), which brings me back to the problem of
2025-03-25 06:55:26 RPC-CORE: getKeysPaged(key: StorageKey, count: u32, startKey?: StorageKey, at?: BlockHash): Vec<StorageKey>:: No response received from RPC endpoint in 60s
The error comes from this function and I am wondering if that's client or server side error. While I am getting familiar with the code, @ermalkaleci maybe you know where rpc timeout parameter is set? I guess it uses a default value...
60s is high enough.
Why do you need to test all accounts? Why don't you override system.account with some dummy and test with them?
I need to have automated tests for running Asset Hub migration which touches all (almost) the Relay Chain accounts. Thus I want to override a wasm and emulate the real migration to see what happens along the way and compare chain (RC and AH) states before and after the migration
It is real, just with a fraction of accounts. You can query some accounts and put them on storage override and run your migration. It will be enough to test the migration, I don't see why you need to do it with all accounts.
That's an option too! However, when I was running the migration using plenty of accounts (without fetching their state) I found two bugs related to overweight. We aim to test and dry-run the migration in an environment that closely mirrors the real one.
If increasing timeout is an issue then how about retrying transient errors for a few times?
I don't think the timeout is the issue here. RPC should respond really fast especially if you're running it locally unless there's some error with polkadot/api
I just figured out the prefetch-storages feature is useless, chopsticks will do prefetch storage for all keys it receives