rippled icon indicating copy to clipboard operation
rippled copied to clipboard

Fix invariant checks for Permissioned Domains

Open oleks-rip opened this issue 1 month ago • 1 comments

High Level Overview of Change

Fixed typo in invariant checks, found by Immunefi

Type of Change

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Refactor (non-breaking change that only restructures code)
  • [ ] Performance (increase or change in throughput and/or latency)
  • [ ] Tests (you added tests for code that already exists, or your new feature included in this PR)
  • [ ] Documentation update
  • [ ] Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
  • [ ] Release

oleks-rip avatar Dec 10 '25 16:12 oleks-rip

Codecov Report

:x: Patch coverage is 90.00000% with 1 line in your changes missing coverage. Please review. :white_check_mark: Project coverage is 79.1%. Comparing base (f816ffa) to head (34ea901).

Files with missing lines Patch % Lines
src/xrpld/app/tx/detail/InvariantCheck.cpp 90.0% 1 Missing :warning:
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop   #6134     +/-   ##
=========================================
- Coverage     79.1%   79.1%   -0.0%     
=========================================
  Files          839     839             
  Lines        71386   71382      -4     
  Branches      8342    8343      +1     
=========================================
- Hits         56441   56435      -6     
- Misses       14945   14947      +2     
Files with missing lines Coverage Δ
src/xrpld/app/tx/detail/InvariantCheck.h 100.0% <ø> (ø)
src/xrpld/app/tx/detail/InvariantCheck.cpp 92.0% <90.0%> (-0.1%) :arrow_down:

... and 1 file with indirect coverage changes

Impacted file tree graph

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Dec 10 '25 17:12 codecov[bot]

Does this need an amendment?

mvadari avatar Dec 11 '25 14:12 mvadari

Does this need an amendment?

No it didn't change anything. I can't imagine situation where before is different from after (except where before is null)

oleks-rip avatar Dec 11 '25 20:12 oleks-rip

@ximinez

  1. sleStatus_ is an optional. It should be a vector, in case more than one permissioned domain is affected

modified and added check for size() > 1 (which is actually impossible as there is no such transaction)

  1. Because you're checking the integrity of the object, and not checking any of the changes between before and after

removed check for "before"

  1. The conditions in finalize are only checked for successful PermissionedDomainSet transactions. 2.1 If no other transaction should be able to modify a permisioned domain object, then it should check that

I don't think it is possible. Invariant checks are all called for every tx, and I don't see a way to figure out that this particular PermissiondDomain sle object is modified by this particular tx. We just assumed that if there is "after" then it was PermissionedDomainSet

2.2 That probably means adding a check for PermissionedDomainDelete

Don't think that we need to add check for PermissionedDomainDelete. Do we actually care if the delete incorrect object ?

I can't convince myself that it's impossible for this change to result in a different ledger. The main reason is that if, somehow, a bad object had ever been written to the ledger, this check will now catch it, whereas it wouldn't have caught it before the change. That will result in the transaction succeeding on some nodes, and failing on others.

This is not true. Currently (without this PR) invariants check only "after" object for both cases ("before" and "after"), and only for PermissionedDomainSet transaction. Which mean "after" always will be present (no crash) and verified. And "before" check is not so important (as you mention early, and I agree). Also this fix is related to verification logic, not to the creation/modification logic. So I don't see a way to create invalid object (remember that "after" was check always). Neither I see the behavior change after this fix, so not need for amendment.

Correct me if I'm wrong somewhere.

oleks-rip avatar Dec 12 '25 20:12 oleks-rip

  1. sleStatus_ is an optional. It should be a vector, in case more than one permissioned domain is affected

modified and added check for size() > 1 (which is actually impossible as there is no such transaction)

EVERY failure case in an Invariant should be impossible. The intention is less "double-check that the transaction did the right thing", and more "if something goes extremely wrong, or an attacker finds an exploit, what are some reasonable checks we can do to stop it?"

That's why invariant unit tests have to modify the open ledger directly. The things they're testing for are, as far as we know, impossible to do with a transaction.

  1. Because you're checking the integrity of the object, and not checking any of the changes between before and after

removed check for "before"

👍

  1. The conditions in finalize are only checked for successful PermissionedDomainSet transactions. 2.1 If no other transaction should be able to modify a permisioned domain object, then it should check that

I don't think it is possible. Invariant checks are all called for every tx, and I don't see a way to figure out that this particular PermissiondDomain sle object is modified by this particular tx. We just assumed that if there is "after" then it was PermissionedDomainSet

2.2 That probably means adding a check for PermissionedDomainDelete

Don't think that we need to add check for PermissionedDomainDelete. Do we actually care if the delete incorrect object ?

We don't care if the object is malformed if it's deleted, but what if it isn't deleted? What if it's deleted incorrectly?

Basic outline of how to check all of the above conditions in finalize() :

auto check = [...I think you can leave the lambda unchanged...]
if (view.rules.enabled(fixPermissionedDomainInvariant))
{
    if (result != tesSUCCESS && sleStatus_.size() == 0)
        // Nothing was changed by a failed tx. All good.
        return true;

    if (sleStatus_.size() > 1)
    {
        JLOG(j.fatal()) << "Invariant failed: transaction affected more than 1 "
                           "entry."; 
        return false;
    }

    switch (tx.getTxnType())
    {
        case ttPERMISSIONED_DOMAIN_SET:
        {
            if (sleStatus_.size() == 0)
            {
                 [Assuming that PDSet _must_ change an object, log and return false, otherwise return true]
                 JLOG(j.fatal()) << "PermissionedDomainSet should only affect one domain object";
                 return false;
            }
            auto const& sleStatus = sleStatus_[0];
            if(sleStatus.isDelete) [Requires adding isDelete to the struct and seting it in `visitEntry()`]
            {
                 JLOG(j.fatal()) << "Domain object deleted by PermissionedDomainSet";
                 return false;
            }
            return check(sleStatus, j);
        }
        case ttPERMISSIONED_DOMAIN_DELETE:
        {
            if(sleStatus_.size() == 0)
            {
                 [Assuming that PDDelete _must_ delete an object, log and return false, otherwise return true]
                 JLOG(j.fatal()) << "No domain objects affected by PermissionedDomainDelete";
                 return false;
            }
            if(!sleStatus_[0].isDelete)	[Requires adding isDelete to the struct and seting it in `visitEntry()`]
            {
                 JLOG(j.fatal()) << "Domain object modified, but not deleted by PermissionedDomainDelete";
                 return false;
            }
            return true;
        }
        default:
        {
            if (sleStatus_.size() != 0)
            {
                JLOG(j.fatal()) << sleStatus_.size() << " domain object(s) affected by an unauthorized transaction.";
                return false;
            }
            return true;
        }
    }
}
else
{
    // The pre-amendment behavior
    if (tx.getTxnType() != ttPERMISSIONED_DOMAIN_SET || result != tesSUCCESS)
        return true;

    return !sleStatus_.empty() ? check(sleStatus_[0], j) : true;
}

I can't convince myself that it's impossible for this change to result in a different ledger. The main reason is that if, somehow, a bad object had ever been written to the ledger, this check will now catch it, whereas it wouldn't have caught it before the change. That will result in the transaction succeeding on some nodes, and failing on others.

This is not true. Currently (without this PR) invariants check only "after" object for both cases ("before" and "after"), and only for PermissionedDomainSet transaction. Which mean "after" always will be present (no crash) and verified. And "before" check is not so important (as you mention early, and I agree).

Ironically, I realized that removing the before check does not require an amendment for exactly this reason.

Also this fix is related to verification logic, not to the creation/modification logic. So I don't see a way to create invalid object (remember that "after" was check always). Neither I see the behavior change after this fix, so not need for amendment.

"I don't see a way to create invalid object" is exactly what Invariants are for. You may not see a way to create an invalid object, or two objects, or to delete an object that shouldn't be deleted, but that doesn't mean that there isn't one.

Back to what I said earlier, the intention of Invariants is to catch those impossible situations that end up happening despite all our best efforts. Maybe you made a mistake. Maybe there's some subtle exploit that nobody can predict. Or maybe an inexperienced developer makes a change five years from now without understanding all the implications and consequences.

The questions to ask yourself are

  • "How could I have screwed up?"
  • "How can someone else screw this up?"
  • "If someone does screw this up, what are the things that are most important to prevent?"

For example, let's assume your implementation of the transactors is perfect. Great. Down the road, there's a new feature that has nothing to do with those transactions. Product, design, or the developer decides it would be cool to automatically create a Permissioned Domain automatically as part of the new HeyThisIsCool transaction. The engineer screws up. They implement it without referring to the current transactors, or using any of the helpers that you've written. The credentials are unsorted, or have duplicates, or there are too many of them, or whatever.

Oh, and they don't look at the Invariants either. Maybe they figure "Hey, if the invariants don't complain, then it must be ok."

As currently written, the Invariants won't complain.

And to add icing to the cake, the code review doesn't catch it. And testing maybe doesn't even know it exists, or they don't thoroughly exercise all the possible edge cases.

"Oh, that can't happen..."

  • #4626
  • #4638
  • #4663

Despite a very good developer implementing the feature, and a boatload of time spent on the code reviews, including onr by me, that issue was discovered and reported by external contributors (including a former Rippler, but we can't count on that). I don't remember the exact timeline, but I think it was before QA got their hands on it, Even so, I don't think they would have caught it, because they're not in the habit of checking the entire object tree for stray objects. (Maybe they should?)

ximinez avatar Dec 13 '25 00:12 ximinez