rippled icon indicating copy to clipboard operation
rippled copied to clipboard

Address rare corruption of NFTokenPage linked list

Open scottschurr opened this issue 1 year ago • 4 comments
trafficstars

High Level Overview of Change

It was discovered that there are a couple of NFT directories on the MainNet that are missing links in the middle of the chain. This pull request does several things to address the corruption:

  1. The source of the corruption was identified and fixed, which should prevent the problem from happening in the future.
  2. A new transaction, LedgerStateFix, is added which allows any account on ledger to pay the fee to repair the links of any single account.
  3. More invariants are added which should prevent similar problems from happening in the future.
  4. And unit tests are added to exercise all of the above changes.

A new amendment, fixNFTokenPageLinks, includes all of these changes.

Details follow.

The Source of the Corruption

There are two different transactions that introduced the corruption into the two directories. In both cases the following conditions were met:

  1. There were at least two NFToken pages in the directory.
  2. The next-to-last page was completely full, holding 32 NFTokens.
  3. The very last page of the directory contained only one NFToken, and the transaction removed that last remaining token from the last page.

When these conditions were met, the last NFToken page was removed and the next-to-last page was left as the final page in the directory.

That would be fine except the NFToken directory has a an expectation that the last page has a specific index. The page with that index was just deleted. So when an NFToken is added to the directory, and that token has a high enough value that it does not belong on the new last page, then a new last page is created that has no links to the previous page.

Now there's a linked list with a hole in the middle of the list.

The fixNFTokenPageLinks amendment modifies the NFToken page coalescing code to notice when the very last page of the directory would be removed. In that case it simply moves all of the contents of the next lower page into the last page and deletes the next-to-last page (and fixes up the links).

New invariant checks also validate aspects of the links on pages, so a similar corruption should cause a tecINVARIANT_FAILED transaction result. That should prevent similar corruptions in the future.

The New Transaction: LedgerStateFix

Once the amendment goes live then that prevents any similar NFToken directory corruption in the future. But we are still left with two damaged NFToken directories. And potentially more that could be created between now and when the amendment goes live.

We could leave the corruptions. I don't like that idea for two reasons:

  1. These corruptions seriously inconvenience those accounts that suffered the corruption, through no fault of their own.
  2. When you drop food on the kitchen floor you clean it up. You don't put up a sign that says, please don't step here. You clean it up.

We could put in code that cleans up the two known corrupted directories when the amendment goes live. That's not a good approach for two reasons:

  1. As mentioned before, additional directories could face similar corruption between now and when the amendment goes live. We would like to be able to clean up all damaged NFToken directories, regardless of when they are created.
  2. The MainNet is not the only place running the XRP Ledger code. Other networks may have similar corruptions that need to be fixed. The approach to the fix should be applicable to other networks as well.

We need a transaction to clean up the directories. There are no transactions that just naturally walk an NFToken directory. So we can't leverage an existing transaction. There are RPC commands that walk the directory, but RPC commands can't modify ledger state.

But we would also rather not introduce a new single-use transaction for this very tiny corner case.

So this approach creates a new transaction aimed at cleaning up messes in the ledger: LedgerStateFix. We'll use it for this situation. If there are other repairable corruptions in the future we can re-use this transaction to repair those future corruptions.

LedgerStateFix

LedgerStateFix fields (in addition to common fields) are:

Field Name Data Type Required/Optional Description
TransactionType uint16 Required LedgerStateFix
Account STAccount Required Account that signs and pays fee
Fee STAmount Required Fee paid for transaction
Flags uint32 Optional Available for future fixes
LedgerFixType uint16 Required Only 1 is currently supported
Owner STAccount Optional Required if LedgerFixType == 1

TransactionType — identifies this as a LedgerStateFix transaction.

Account — identifies the account signing and submitting the transaction as well as paying the Fee.

Fee — this transaction is rare and potentially compute intensive. We want to discourage folks from spamming the ledger with these. So the minimum fee is the same as the fee for an AccountDelete transaction. And, yes, if the transaction fails with a tec code the fee is still charged.

Flags — not needed for LedgerFixType == 1. May be used for some unknown future ledger fix.

LedgerFixType — for now the only valid value is 1, which fixes the NFToken directory for a single account.

Owner — the account that owns the NFToken directory that needs fixing. Need not have any relationship to Account.

Fixing NFTokenPage Links

Once the amendment goes live, any corrupted NFTokenPage directory can be fixed by submitting a transaction like this:

{
   "Account" : "<Signer and fee payer>",
   "Fee" : "2000000",
   "LedgerFixType" : 1,
   "Owner" : "<Account with corrupted NFTokenPage directory",
   "Sequence" : <n>,
   "SigningPubKey" : "<Account's public key>",
   "TransactionType" : "LedgerStateFix",
   "TxnSignature" : "<Signature>",
}

If a repair is actually completed, then tesSUCCESS is returned. If there is an error or if the directory did not need repair then a tec code is returned.

Future Fixes

Unfortunately this is not the first time ledger state has needed tidying up. Amendment fixTrustLinesToSelf removed invalid state from the ledger. Since programmers are humans we can expect similar kinds of problems to show up in the future.

If there are future ledger state problems, they can be attached to the LedgerStateFix transaction by picking a LedgerFixType value other than 1. Any additional parameters can be added as optional values.

Context of Change

While I was research how efficient NFTokenPages actually are in use, I stumbled across two NFToken directories that were missing links in the middle of the chain. This is a form of ledger corruption. But its consequences do not seem too dire, even for the owners of those directories. Known consequences are:

  1. The account_objects RPC command returns only those objects at the beginning of the NFToken directory up to the point where a link is missing. So, for those accounts with missing links, the NFTokens returned by account_objects is incomplete.
  2. The account_nfts RPC command has a problem similar to account_objects. account_nfts returns NFTokens only up to the point where a link is missing.

There may be other consequences, but I've looked for and not found any. Particularly, I've not seen any evidence of lost NFTokens. NFTokens still get added to the correct owner's page. They simply are not reported by the RPC commands.

Type of Change

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [x] Tests (there are tests for the new code)
  • [x] Documentation will need to be added to cover the new transaction.

API Impact

  • [x] Public API: New feature (a new transaction type is added)

Consideration

This NFT-related amendment arrives at about the same time as a different NFT-related amendment: https://github.com/XRPLF/rippled/pull/4946

Should these two pull requests be merged into a single amendment?

scottschurr avatar Mar 09 '24 00:03 scottschurr

Codecov Report

Attention: Patch coverage is 87.87879% with 16 lines in your changes missing coverage. Please review.

Project coverage is 74.9%. Comparing base (0a331ea) to head (a6ae5b5).

Files Patch % Lines
src/xrpld/app/tx/detail/NFTokenUtils.cpp 81.6% 14 Missing :warning:
src/xrpld/app/tx/detail/LedgerStateFix.cpp 93.8% 2 Missing :warning:
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           develop   #4945    +/-   ##
========================================
  Coverage     74.8%   74.9%            
========================================
  Files          766     768     +2     
  Lines        63004   63134   +130     
  Branches      8842    8848     +6     
========================================
+ Hits         47139   47264   +125     
- Misses       15865   15870     +5     
Files Coverage Δ
include/xrpl/protocol/Feature.h 100.0% <ø> (ø)
include/xrpl/protocol/SField.h 100.0% <ø> (ø)
include/xrpl/protocol/TER.h 100.0% <ø> (ø)
src/libxrpl/protocol/Feature.cpp 94.6% <ø> (ø)
src/libxrpl/protocol/SField.cpp 77.5% <ø> (ø)
src/libxrpl/protocol/TER.cpp 100.0% <ø> (ø)
src/libxrpl/protocol/TxFormats.cpp 100.0% <100.0%> (ø)
src/xrpld/app/tx/detail/InvariantCheck.cpp 91.1% <100.0%> (+4.1%) :arrow_up:
src/xrpld/app/tx/detail/InvariantCheck.h 100.0% <ø> (ø)
src/xrpld/app/tx/detail/LedgerStateFix.h 100.0% <100.0%> (ø)
... and 4 more

... and 2 files with indirect coverage changes

Impacted file tree graph

codecov-commenter avatar Mar 09 '24 01:03 codecov-commenter

Why do you use the custom transaction method instead of doing this automatically which is what happens in activateTrustLinesToSelfFix which is called when the amendment is activated? Are you worried that more corrupted pages will create before the amendment activates?

exunda avatar Jun 02 '24 14:06 exunda

@exunda asked:

Why do you use the custom transaction method instead of doing this automatically which is what happens in activateTrustLinesToSelfFix which is called when the amendment is activated?

That's a fair question. It was answered in the (admittedly long) description of the pull request. Here's the quote:

We could put in code that cleans up the two known corrupted directories when the amendment goes live. That's not a good approach for two reasons:

  1. As mentioned before, additional directories could face similar corruption between now and when the amendment goes live. We would like to be able to clean up all damaged NFToken directories, regardless of when they are created.
  2. The MainNet is not the only place running the XRP Ledger code. Other networks may have similar corruptions that need to be fixed. The approach to the fix should be applicable to other networks as well.

We need a transaction to clean up the directories. There are no transactions that just naturally walk an NFToken directory. So we can't leverage an existing transaction. There are RPC commands that walk the directory, but RPC commands can't modify ledger state.

Does that answer your question?

scottschurr avatar Jun 03 '24 15:06 scottschurr

Yes thank you for explaining more

exunda avatar Jun 11 '24 06:06 exunda

Due to changes in develop, there are now merge conflicts. @scottschurr would it make sense for you to bring this branch up-to-date?

intelliot avatar Jul 25 '24 23:07 intelliot

@intelliot, I'll work on that merge.

scottschurr avatar Jul 29 '24 23:07 scottschurr

Once the merge is done, will this PR be ready to merge?

ximinez avatar Jul 31 '24 17:07 ximinez

@ximinez, the merge is almost done -- I need to fix some clang formatting. Gahh, merges across the restructure are painful.

Once clang-format is fixed (soon) I think it could be merged to develop. It has two approvals. But, given that it's an amendment, it would be nice if there were one more review from a senior developer. I had not marked the branch as passed because I was waiting for that additional review.

scottschurr avatar Jul 31 '24 18:07 scottschurr

Note: I believe this fixes internal ticket id RIPD-1841.

intelliot avatar Jan 24 '25 21:01 intelliot