I think the call is large because BDK is asking Bitcoin Core to watch 100 addresses. Is there a way to reduce this number? I don't see it in the RpcBlockchain options.
Another option would be to have an option to indicate that you trust the bitcoin core node, and send it an xpubkey-based descriptor. Although this is based off the assumption I'm making that it's the number of addresses that is the problem, and not another issue with importdescriptors.
Thanks @casey for reporting this.. This is porobably one more reason to move from "raw" descriptors.. I think core now supports Xpub based descriptors too.. Might be a good time to shift to it..
xpub-based descriptors would be great! In the mean time, is there a away to have bdk watch fewer addresses, e.g. watch 10 instead of 100? This is making our tests really slow, so a bandaid would be great.
FYI I believe core still doesn't support all the miniscript operators in descriptors that bdk supports, which is why we can't use xpub-based descriptors. See https://bitcoindevkit.org/descriptors/#operators.
I looked into this and reproduced this issue. I probably misunderstood the purpose of start_script_count. Which does not influence the script count at sync time.
Detailing some findings below as fixing this might require some broader discussions.
It seems enforcing script sync number lower than our CACHE_ADDR_BATCH_SIZE (hard coded as 100) is not much straight forward, and can have deeper effect in wallet operations..
The RPC sync always imports the full script pubkey list from the database, defined here. https://github.com/bitcoindevkit/bdk/blob/13cf72ffa770289dc06fae3ce84b52ce392fc0cb/src/blockchain/rpc.rs#L387-L392
Where ext_spks and int_spks are the full list of external and internal descriptor.
https://github.com/bitcoindevkit/bdk/blob/13cf72ffa770289dc06fae3ce84b52ce392fc0cb/src/blockchain/rpc.rs#L317-L318
This can make RPC sync slow as we will be redoing duplicate imports in every sync calls.
And it doesn't seem to be very straight forward either to implement a smaller batch of address for sync only, than our CACHE_ADDR_BATCH_SIZE.
Questions:
How to keep track of how many addresses have been imported in the blockchain. Should this be in RpcSyncParams?
How can we know when to import new addresses/descriptor into RPC?
The sync batch size reduces with CACHE_ADDR_BATCH_SIZE. Should we make that value configurable instead? That seems like the easiest way to solve this particular issue at hand for testing configurations (@casey, @terror you might wanna just use a fork with that value changed, its in src/wallet/mod.rs, until we figure what to do here?? 😅)
But I think we do need to think out something smarter for the RPC blockchain sync. Importing everything for each sync call seems suboptimal.
Tagging @afilini @evanlinjin who might have checked this part of the world recently..
Thank you @casey for filing this issue, and @rajarshimaitra for identifying the part of the codebase responsible for the inefficient sync logic.
When I was restructuring the RpcBlockchain implementation, I purposefully make a trade-off for reliability over efficiency. This was in part, because of the limitations of the Bitcoin Core JSON-RPC interface and the limitations of the Blockchain trait itself in BDK. The result of this, is having to import all relevant descriptors/scriptPubKeys in every sync cycle.
I propose the following solution:
Make the importmulti/importdescriptors RPC call paginated. I assume this would help us avoid the socket error 35 (probably need to test this theory).
Have an internal "store" that RpcBlockchain uses to keep track of which descriptors/scriptPubKeys are already imported into Bitcoin Core. This store can just be a simple integer that represents derivation index to start import from for next cycle (but should be unique for each Bitcoin Core wallet instance).
Maybe @afilini and @danielabrozzoni can provide a bit more insight/opinions.
PR: #740
P.S. I do not think changing CACHE_ADDR_BATCH_SIZE will solve the issue.
Just wanted to share that we updated our BDK fork to change CACHE_ADDR_BATCH_SIZE from 100 to 1, and this reduced our total node startup time by over 10x - BDK sync was by far the biggest bottleneck. We use async esplora to sync BDK, and a completely empty wallet was making ~100 API calls to Blockstream Esplora (in batches of 25). Caching addresses doesn't make any sense for us because we expect to generate the wallet seed and update the derivation indexes ourselves, and don't prioritize the import of external seeds as a first-class feature. Ideally this batch size is configurable - is that the case in BDK 1.0? Based on the discussion above it sounds like the current caching strategy might be disposed of entirely in 1.0?
@MaxFangX indeed. There is no "caching" in BDK 1.0. Script pubkeys are not stored on disk at all. I am quite surprised that the old implementation performs so badly though. It should be pulling in batches of whatever you configure in the esplora client and then stopping after some configurable stop gap iirc.
I don't think we have a importdescriptors feature at all anymore. you can still do things this way at the application level if you want but there are no guiderails on how to approach using bitcoin core's wallet as your main chain indexing wallet and then BDK to do coin selection/tx construction.
There have been a an attempt by @rajarshimaitra to add this back in the past which would be the best thing to look at to re-introduce this. In general, the problem with the previous appraoch in <1.0 is that it was not using bitcoin rpc as it was intended to beused but instead inserting huge numbers of SPKs into it (hence the issue).
Makes sense. Anyone who ran into this issue please try the shiny new 1.0.0-alpha block by block rpc syncing and let us know if you see any performance issues.