Impact of `lookahead` variable on users of bdk_wallet

Open thunderbiscuit opened this issue 1 year ago • 3 comments

Hello folks. After reviewing the new builder pattern for the wallet type and speaking to people in Nashville, I've come to the conclusion that the idea behind the lookahead and its impact on users of bdk_wallet is not well understood. Moreover, setting the lookahead value is part of the public API of the Wallet (through the builder) but is not (I think?) part of persistence, so understanding it well is important for users.

This issue is meant to clear that up for everyone, and arrive at either:

Bubbling up to the bdk_wallet API docs what setting this value entails and how it will impact the behaviour of the wallet, including tradeoffs and dangers (or not) of setting different lookahead values between sessions.
Remove the setter from the Wallet public API entirely if it's better left as an internal construct.

CC'ing the two people I have not yet asked about this in person or over the dev call yesterday: @evanlinjin @LLFourn.

Related API Docs

Just so they're easy to find for everyone, here are the API docs on this:

Previous Discussions

See my Q2 on #1533:

I see that the lookahead is a setter, but even after reading the docs I'm not sure what it's setting exactly because it just sounds like a sort of stop gap but not a stop gap. See the docs here on lookahead and the more explicit lookahead in KeychainTxOutIndex.

When an user first recovers a wallet (i.e. from a recovery phrase and/or descriptor), we will NOT have knowledge of which script pubkeys are revealed. So when we index a transaction or txout (using index_tx/index_txout) we scan the txouts against script pubkeys derived above the last revealed index. These additionally-derived script pubkeys are called the lookahead.

Don't we scan up to the stop gap? I know this from experience when looking at the Esplora logs on full scan and syncs (for example a stop gap of 3 with a lookahead of 100 will query the Esplora instance for only 4 addresses), so not sure if maybe it refers to something else?

The KeychainTxOutIndex is constructed with the lookahead and cannot be altered.

Wait so is it persisted then? Or by that do we mean it can't be altered per session but altered on every load of the wallet?

I know (think?) the lookahead is not persisted, but is there a danger in having two different lookahead between sessions? What if I set a lookahead of 3 when I create a wallet, and 300 the next time I load the wallet? Or the opposite?

And my follow up here:

Does lookahead impact the sync/full_scan workflows? I have not been able to find this to be the case locally, so I'm still unclear about what this lookahead is doing, and because it's part of the public API of the Wallet type I want to make sure I really get it. Here's what I have tried on regtest, and looking at the logs for my esplora instance:

Stop gap of 3, lookahead of 100. The wallet syncs 6 addresses total (3 per keychain). The lookahead does not impact the spks the wallet attempts to sync.
Stop gap of 25, lookahead of 7. The wallet syncs 50 addresses, (lookahead didn't seem to be a problem, again not impacting the syncing process)
Send coins to address 19, set the lookahead to 7 and the stop gap to 25. The coins are seen and wallet shows correct balance. Lookahead below the stop gap did not cause any problems.

More notes on this: the docs here say:

Without a lookahead the index will miss outputs you own when processing transactions whose output script pubkeys lie beyond the last revealed index. In certain situations, such as when performing an initial scan of the blockchain during wallet import, it may be uncertain or unknown what the index of the last revealed script pubkey actually is.

Again this makes it sound close to the concept of the stop gap, but I assume it's different somehow or a more generalized version of it that can be used in the deeper logic of the KeychainIndex? That's as far as the API docs go.

And the response from @valuedmammal:

The lookahead doesn't appear to have a role during full scan for esplora because we get the SPKs to sync by calling all_unbounded_spk_iters

For sync it's also not relevant because we only sync SPKs that are already revealed.

In short, the lookahead defines a number of scripts to derive ahead of time without revealing them. If you do a full scan with the default lookahead of 25 and persist the wallet, then on subsequent loads ~~those SPKs are still stored in the index and there shouldn't be any harm in setting a new lookahead.~~

Edit: Nope that's also wrong. we don't persist derived SPKs, only the value of last_revealed. So yeah we probably need more testing.

One way to illustrate the effect of the lookahead is to create a wallet with the default value of 25. Then peek at address index 25. The method Wallet::is_mine will return false because internally the address is not yet derived (remembering that SPKs are 0-indexed).

For context one of the reasons we chose to make the lookahead configurable is because for a short time we had it set to 1000 and users reported unbearably long load times. Beyond that, I don't think it's something users should generally care about other than to know it exists

Further questions

Following up on the comment from VM,

Is the lookahead important for people that would add custom transactions to track in their local TxGraph, which contain addresses that have not yet been revealed, and for methods like is_mine() to work as intended?
Quote: we chose to make the lookahead configurable is because for a short time we had it set to 1000 and users reported unbearably long load times. Here do we mean creating the wallet itself, i.e. the operation of deriving the 1000 addresses in the lookahead?

Aug 07 '24 15:08 thunderbiscuit