bitcoin Add a "tx output spender" index

This PR adds a new "tx output spender" index, which allows users to query which tx spent a given outpoint with the gettxspendingprevout RPC call that was added by https://github.com/bitcoin/bitcoin/pull/24408.

Such an index would be extremely useful for Lightning, and probably for most layer-2 protocols that rely on chains of unpublished transactions.

UPDATE: this PR is ready for review and issues have been addressed:

using a watch-only wallet instead would not work if there is a significant number of outpoints to watch (see https://github.com/bitcoin/bitcoin/pull/24539#issuecomment-1276595646)
this PR does not require -txindex anymore

This index is implemented as a simple k/v database (like other indexes):

keys are siphash(spent outpoint) (64 bytes)
values are list of spending tx positions( in case there are collisions which should be exceptionally uncommon)., using the same encoding as txindex (variable size, max is 12 bytes).

The spending tx can optionally be returned by gettxspendingprevout (even it -txindex is not set).

Mar 11 '22 19:03 sstone

Concept ACK. It would be good know more about specific use-cases for this, if you can provide some links or more description. But the implementation seems very simple, so I don't think adding this would be a maintenance burden.

Mar 11 '22 21:03 ryanofsky

@ ryan, I think the idea is that there is one coin that is owned by more than one entity. There may be several (conflicting) pre-signed txs spending it and as soon as it is spent, all entities with ownership in it want to check which pre-signed tx spent the coin. If they use Bitcoin Core to monitor the coin, they may use:

A watch-only wallet (which requires a file on disk and BDB/Sqlite)
Walk the raw chain and raw mempool themselves (According to https://github.com/bitcoin/bitcoin/pull/24408#issuecomment-1059188351 this is doable for the chain, but may not be for the mempool)
Use an index. A blockfilterindex might speed up scanning the chain, whereas this index directly gives the answer.

Mar 12 '22 08:03 maflcko

Lightning channels are basically trees of unpublished transactions:

a commit tx that spends a shared onchain "funding" UTXO
HTLC txs that spend the commit tx (there can be competing HTLC txs that spend the same commit tx output: one for successful payments, one for timed out payments)

There is also a penalty scheme built into these txs: if you cheat, you need to wait before you can spend your outputs but your channel peer can spend them immediately.

And LN is also transitioning to a model where these txs pay minimum fees and can be "bumped" with RBF/CPFP.

=> it's very useful for LN nodes to detect when transactions have been spent, but also how they've been spent and react accordingly (by extracting info from the spending txs for example), and walk up chains of txs that spend each other.

As described above, there are tools in bitcoin core that we could use and would help a bit, and Lightning nodes could also create their own indexes independently or use electrum servers, block explorers, but this would make running LN nodes connected to your own bitcoin node much easier.

Mar 12 '22 11:03 sstone

Concept ACK

I think the use case is clear and if others think the maintenance burden is reasonable there doesn't seem to be an obvious downside.

This PR can be viewed as an "extension" to #24408

Seems like review should be focused on #24408 first but assuming that gets merged, Concept ACK for this.

Mar 12 '22 15:03 michaelfolkson

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	ryanofsky, michaelfolkson, mzumsande, glozow, aureleoules, achow101, TheCharlatan
Approach ACK	w0xlt

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

Mar 12 '22 21:03 DrahtBot

* `getmempooltxspendingprevout` in #24408 takes a list of outpoints, `gettxospender` takes a single outpoint. Would it make sense to have the same format here (I'd guess a common use case would be to check both RPCs with the same outpoint(s)?)

Bitcoin supports batched RPC request so from a functional p.o.v it's equivalent to passing a list of outpoints but it's a bit easier to reason about imho. I think that the use case for getmempooltxspendingprevout is a bit different: you want to bump fees for a set of txs that are related to the same channel and want to understand who is spending what. You may also have txs with multiple inputs, and would pass them all together to getmempooltxspendingprevout.

For gettxospender the use case is trying to understand why channels have been closed and react if needed, if we could pass in multiple outpoints they would likely be unrelated.

* Taking this one step further, `getrawtransaction` queries transaction data from the mempool or from blocks (if txindex is enabled) in one RPC. Would it make sense to have something similar for outpoints, instead of two separate RPCs?

Maybe, let's see where the discussion in https://github.com/bitcoin/bitcoin/pull/24408 but to me it feels cleaner to have separate calls.

* I think that the indexed data will not be deleted in case of a reorg (but possibly overwritten if an outpoint would be spent in a different tx as a result of a reorg). That means that a result from `gettxospender` could be stale. Would it make sense to delete entries or should we at least mention this possibility in the rpc doc?

* What's the reason for the dependency on -txindex? As far as I can see it's not technical, so would knowing the fact that an outpoint was used in a tx (even if we can't lookup the details of that tx further) be sufficient for some use cases?

We would almost always want to get the actual tx that spends our outpoint. My reasoning was that relying on -txindex makes gettxospender robust and consistent in the case of reorgs and is also simpler and easier to understand that storing block positions instead of txids.

Mar 14 '22 10:03 sstone

Concept ACK, nice

Apr 01 '22 18:04 glozow

Rebased on https://github.com/bitcoin/bitcoin/pull/24408

Apr 25 '22 17:04 sstone

Looks like the test somehow went away? If #https://github.com/bitcoin/bitcoin/pull/24408 is merged soon, you can fix everything on the next rebase.

Apr 27 '22 12:04 maflcko

Looks like the test somehow went away? If ##24408 is merged soon, you can fix everything on the next rebase.

Yes sorry about that, I've brought it back. I will also add another commit to move gettxspendingprevout out of rpc/mempool.cpp.

Apr 27 '22 15:04 sstone

As with #24408, unless there's a need for looking up an unpredictable outpoint after its spend is mined (why?), I think a watch-only wallet is the correct approach for this use case, and expecting users to build infinitely-growing indexes isn't a good solution.

Apr 29 '22 20:04 luke-jr

@sstone I created a branch that modifies this PR to include the in_mempool and block_height of the transaction that spends the output. The result is like this:

[
  {
    "txid": "c8de5b22e7dac049cd6a2628580876d74c16a58a886be2b4b1c744901f8e953e",
    "vout": 0,
    "spendingtxid": "117dd0df58201d4ac6c49472dc59208487caa7d79079a4bb32d85c16f4174e72",
    "in_mempool": false,
    "block_height": 88551
  }
]

If you agree with this approach, feel free to use the code of that branch. https://github.com/w0xlt/bitcoin/tree/add_txospender_index_2

May 02 '22 18:05 w0xlt

There is no need for a new commit to address review comments. The last commit can be squashed.

https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits

May 03 '22 19:05 w0xlt

The static RPCHelpMan getindexinfo() RPC should also be updated.

static RPCHelpMan getindexinfo()
{
    // ...
    if (g_txindex) {
        result.pushKVs(SummaryToJSON(g_txindex->GetSummary(), index_name));
    }

    if (g_coin_stats_index) {
        result.pushKVs(SummaryToJSON(g_coin_stats_index->GetSummary(), index_name));
    }

+    if (g_txospenderindex) {
+        result.pushKVs(SummaryToJSON(g_txospenderindex->GetSummary(), index_name));
+    }

    ForEachBlockFilterIndex([&result, &index_name](const BlockFilterIndex& index) {
        result.pushKVs(SummaryToJSON(index.GetSummary(), index_name));
    });
    // ...
}

May 04 '22 21:05 w0xlt

The static RPCHelpMan getindexinfo() RPC should also be updated.

static RPCHelpMan getindexinfo()
{
    // ...
    if (g_txindex) {
        result.pushKVs(SummaryToJSON(g_txindex->GetSummary(), index_name));
    }

    if (g_coin_stats_index) {
        result.pushKVs(SummaryToJSON(g_coin_stats_index->GetSummary(), index_name));
    }

+    if (g_txospenderindex) {
+        result.pushKVs(SummaryToJSON(g_txospenderindex->GetSummary(), index_name));
+    }

    ForEachBlockFilterIndex([&result, &index_name](const BlockFilterIndex& index) {
        result.pushKVs(SummaryToJSON(index.GetSummary(), index_name));
    });
    // ...
}

Thanks! done in https://github.com/bitcoin/bitcoin/pull/24539/commits/d9d96f8b5dab7a617efb95c7a4c53114a7815abe

May 05 '22 20:05 sstone

As described above, there are tools in bitcoin core that we could use and would help a bit

MarcoFalke and luke-jr suggested using a watch-only wallet for this use case. I don't fully understand how Lightning works in depth/implementation. Do you know @sstone what would be the pros and cons of using a watch-only wallet for this?

I haven't fully synced the txospenderindex (height 568907) and I know the index is optional but it's already ~60GB, so is it worth it for layer 2 node operators?

Sep 28 '22 13:09 aureleoules

For most LN nodes watch-only wallets are probably enough: you would import all your channel outpoints, and check your wallet transactions for spending transactions (bitcoin core should include them). AFAIK you would do this by importing each channel outpoint in a single "address" descriptor (though it seems that there is currently no way to remove a descriptor ?).

But I'm not sure yet how practical and efficient this would be for large nodes with 10s of 1000s of transactions (I would like to have bitcoin core do some kind of filtering before it returns wallet transactions).

Oct 03 '22 16:10 sstone

update: unless I've missed something, unless you have just a few transactions to monitor, using watch-only wallet would not work as is and even if it was modified it would still create serious operational issues:

If we were to use a single watch-only wallet and import all the outpoints that we need to watch, with one descriptor per outpoint , we would still to to fetch all wallet transactions to get spending transactions even if we used labels when importing our descriptors because label filtering only works for incoming transactions, so this does not scale (and again there does not seem to be a way to remove descriptors once they've been imported).

Since with LN we currently know which transactions would spend the outpoints that we are watching, we could also import them. It would solves our filtering problem but we would end up with a huge number of transaction in our wallet (millions of them).

Another option would be using one watch-only wallet per channel. This would be much better from an indexing point of view but would scale even less because there would be one wallet database per channel.

And even if we modified how watch-only wallets work to add an optional "spent-by" field to listtransactions (this would be fairly easy), using them would be a problem from an operational point of view: rebuilding a node can be done very quickly, as we don't even need to restore the bitcoin wallet that we used before (we just need to have a backup): we can just point our system to a new bitcoin node with a fresh wallet and we're good to go. With watch-only wallets we would first need to rebuild this wallet, which could take hours, before we're operational again.

=> I still believe that indexing "spending" transactions is a better approach than watch-only wallets.

Oct 12 '22 18:10 sstone

Thanks for the clarification @sstone.

I still believe that indexing "spending" transactions is a better approach than watch-only wallets.

Agreed, this is Concept ACK for me then.

rebuilding a node can be done very quickly, as we don't even need to restore the bitcoin wallet that we used before (we just need to have a backup): we can just point our system to a new bitcoin node with a fresh wallet and we're good to go. With watch-only wallets we would first need to rebuild this wallet, which could take hours, before we're operational again.

The "new bitcoin node" would still need to have the txospenderindex constructed though right? Which also takes some time no?

Oct 12 '22 18:10 aureleoules

The "new bitcoin node" would still need to have the txospenderindex constructed though right? Which also takes some time no?

Yes, you would need to run your bitcoin nodes with -txospenderindex and -txindex (but theses indexes are "generic" and completely independent from your application).

Oct 13 '22 08:10 sstone

I did not review the code but tested the PR at commit 972728ab72b63ad074ffc1f2d8275d0c05672fa8 and it works.

It would be great to reduce the size of the index because it is well larger than txindex:

bitcoin/indexes $ du -h --max-depth=1
119G	./txospenderindex
38G	./txindex
157G	.

Instead of storing outpoint (32 + 4 bytes) -> txid (32 bytes) it could be outpointhash (32 bytes) -> txid (32 bytes) to gain 4 bytes per txos (or even using RIPEMD160 to gain 4 + 12 bytes per txo).

The txospenderindex stores redondantly the funding_txid as key for each transaction's outputs. It could be made almost half its current size if it was possible to store with the following schema:

funding_txid (32 bytes) -> concat_txids (32 * number_of_outpoints_in_funding_txid bytes)

where concat_txid would be all the spending txid concatenated in the order of the vout of the funding transaction or 32 bytes at 0 if the outpoint is unspent. To get the spending transaction of an outpoint txid:vout, you would just have to read the bytes 32 * vout to 32 * (vout+1) of the concat_txids given by the index. However this requires modifying the value at a given key during the sync of the index, I don't know if it is an issue.

It could be nice to provide more informations on the spending transaction too, like blockheight of spending transaction and the index of the input spending the outpoint in the spending transaction althought not necessary since these information can be found by using txindex.

Nov 18 '22 16:11 Pantamis

It would be great to reduce the size of the index because it is well larger than txindex: ... Instead of storing outpoint (32 + 4 bytes) -> txid (32 bytes) it could be outpointhash (32 bytes) -> txid (32 bytes) to gain 4 bytes per txos (or even using RIPEMD160 to gain 4 + 12 bytes per txo).

I went for a simple scheme that is easy to use and did not try to optimize for space as it would still be O(n) so there's not much to be gained unless I'm missing something. It could be made smaller by using blockheight | txindex |outputindex (8 bytes) instead of txids ((this is how we define "short channel ids" in LN), but it would be harder to use.

The txospenderindex stores redondantly the funding_txid as key for each transaction's outputs. It could be made almost half its current size if it was possible to store with the following schema:

funding_txid (32 bytes) -> concat_txids (32 * number_of_outpoints_in_funding_txid bytes)

where concat_txid would be all the spending txid concatenated in the order of the vout of the funding transaction or 32 bytes at 0 if the outpoint is unspent. To get the spending transaction of an outpoint txid:vout, you would just have to read the bytes 32 * vout to 32 * (vout+1) of the concat_txids given by the index. However this requires modifying the value at a given key during the sync of the index, I don't know if it is an issue.

Same reason as above, I'll have a look but I'm not sure the gain is worth the added complexity.

It could be nice to provide more informations on the spending transaction too, like blockheight of spending transaction and the index of the input spending the outpoint in the spending transaction althought not necessary since these information can be found by using txindex.

Yes, I supposed that users would call gettransaction to get additional info on the spending tx

Nov 21 '22 14:11 sstone

Concept ACK

Dec 11 '23 16:12 sedited

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

_{Debug: https://github.com/bitcoin/bitcoin/runs/21097591816}

Mar 15 '24 12:03 DrahtBot