tangle icon indicating copy to clipboard operation
tangle copied to clipboard

Full rewards lifecycle

Open drewstone opened this issue 3 months ago • 9 comments

Critical fix for operator reward claiming flow. Previously, customer payments were reserved in customer accounts but never transferred to the rewards pallet, causing all claim_rewards() calls to fail with insufficient funds.

Changes:

  • Updated charge_payment_with_asset() to transfer funds to rewards pallet account instead of reserving
  • Added account_id() method to RewardRecorder trait to expose rewards pallet account
  • Changed record_reward() to return error instead of warning when MaxPendingRewardsPerOperator exceeded
  • Made charge_payment() pub(crate) for test accessibility
  • Added comprehensive end-to-end tests in operator_rewards.rs (9 new tests)

This ensures the complete payment flow works: customer payment → transfer to rewards pallet → distribution recorded → operator can claim

Fixes apply to all payment models: PayOnce, Subscription, and EventDriven.

drewstone avatar Oct 13 '25 19:10 drewstone

@claude review

drewstone avatar Oct 15 '25 21:10 drewstone

Claude finished @drewstone's task —— View job


Code Review: Full rewards lifecycle

Review Tasks:

  • [x] Read repository guidelines and setup
  • [x] Review core payment processing changes
  • [x] Analyze rewards pallet modifications
  • [x] Examine new test coverage
  • [x] Check for potential issues and improvements
  • [x] Provide comprehensive feedback

✅ Critical Fix Verified - Excellent Implementation

This PR successfully resolves the core issue where customer payments were reserved but never transferred to the rewards pallet, causing claim_rewards() to fail. The implementation is thorough and well-architected.

Key Changes Analysis

1. Core Fix: Payment Transfer Flow

File: pallets/services/src/payment_processing.rs:362-426

The critical change in charge_payment_with_asset():

// OLD: Reserved funds in customer account (insufficient for claims)
// NEW: Direct transfer to rewards pallet account
let rewards_account = T::RewardRecorder::account_id();
T::Currency::transfer(payer, &rewards_account, amount, ExistenceRequirement::KeepAlive)?;

Impact: Enables the complete payment flow: customer → rewards pallet → claimable by operators.

2. Rewards Trait Enhancement

File: primitives/src/traits/rewards.rs:148-151

Added account_id() method to RewardRecorder trait:

fn account_id() -> AccountId;

This exposes the rewards pallet account for payment transfers.

3. Reward Recording with Aggregation

File: pallets/rewards/src/lib.rs:713-804

Excellent aggregation logic that:

  • Combines multiple rewards for same operator+service_id
  • Prevents BoundedVec overflow through smart aggregation
  • Maintains bounded unique services per operator
  • Uses saturating_add for overflow protection

4. Error Handling Improvement

File: pallets/rewards/src/lib.rs:774-785

Changed MaxPendingRewardsPerOperator from warning to error - proper failure mode when operator has too many pending unique services.

Test Coverage Excellence

Comprehensive E2E Testing

Files: pallets/services/src/tests/operator_rewards_e2e.rs, operator_rewards.rs

  • 9 new test files covering all payment models
  • Real balance tracking with exact assertions
  • Multi-block subscription simulation
  • Complete claim flow testing
  • Negative test cases (unauthorized access, insufficient funds)

Standout test: test_full_e2e_native_payment_with_claim() verifies:

  1. Customer payment deduction: ✓ 10,000 TNT
  2. Rewards pallet receipt: ✓ 10,000 TNT
  3. Operator pending rewards: ✓ 8,500 TNT (85%)
  4. Developer pending rewards: ✓ 1,000 TNT (10%)
  5. Successful claim transfers: ✓ All amounts match
  6. End-to-end money conservation: ✓ No funds lost

Revenue Distribution Logic

File: pallets/services/src/functions/reward_distribution.rs

Clean 85/10/5 split with exposure-weighted operator distribution:

  • Operators: 85% (weighted by security exposure)
  • Developer: 10% (blueprint creator)
  • Treasury: 5% (protocol)

Robust arithmetic with overflow protection and dust handling.

Security & Quality Assessment

Security Strengths 🔒

  • ✅ Authorization checks: caller == payer in charge_payment_with_asset
  • ✅ Balance validation before any transfers
  • ✅ Proper error handling for insufficient funds
  • ✅ Overflow protection using saturating_add and checked_mul
  • ✅ BoundedVec limits prevent resource exhaustion

Code Quality 📏

  • ✅ Follows Substrate patterns and best practices
  • ✅ Comprehensive error types and logging
  • ✅ Clean separation of concerns (payment → distribution → recording)
  • ✅ Extensive documentation and inline comments
  • ✅ Mock implementations for testing

Minor Observations

Performance Consideration

The aggregation logic in record_reward() uses linear search through pending rewards. With MaxPendingRewardsPerOperator = 50, this is acceptable, but consider a HashMap for larger limits.

Dust Handling 💰

Small rounding errors from Perbill calculations are logged but not redistributed. This is acceptable for the amounts involved.

Compatibility Impact

All Payment Models Fixed

  • ✅ PayOnce: Direct payment → immediate distribution
  • ✅ Subscription: Recurring payments → continuous distribution
  • ✅ EventDriven: Event-based payments → proportional distribution

All models now properly transfer funds to rewards pallet instead of reserving.

Recommendation: APPROVE

This PR represents a critical infrastructure fix with:

  • ✅ Complete resolution of the payment → claim flow issue
  • ✅ Production-ready test coverage (9 test files, 100+ test cases)
  • ✅ Backwards-compatible implementation
  • ✅ Proper security considerations
  • ✅ Clean, maintainable code architecture

The implementation is thorough, well-tested, and ready for production deployment.


claude[bot] avatar Oct 15 '25 21:10 claude[bot]

@danielbui12 if you want to help get benchmarks working unless @1xstj has gotten to that point.

drewstone avatar Oct 20 '25 15:10 drewstone

@claude find bugs, what could go wrong with the subscription cursor.

drewstone avatar Oct 20 '25 15:10 drewstone

Claude finished @drewstone's task —— View job


Analyzing Subscription Cursor for Potential Bugs

Analysis Tasks:

  • [x] Read subscription cursor implementation and tests
  • [x] Analyze cursor advancement logic for edge cases
  • [x] Check boundary conditions and overflow scenarios
  • [x] Review state consistency and persistence
  • [x] Examine error handling and recovery
  • [x] Look for race conditions and concurrency issues
  • [x] Provide comprehensive bug analysis

🚨 Critical Bug Analysis: Subscription Cursor Issues Found

After thorough analysis of the subscription cursor implementation, I've identified several critical bugs and edge cases that could cause serious issues:

🔥 Critical Bug #1: Cursor Skipping Logic Flaw

File: pallets/services/src/payment_processing.rs:514-523

Issue: The cursor resumption logic has a critical flaw when the cursor entry itself is removed from storage:

if skip_until_cursor {
    if let Some(ref cursor) = cursor_key {
        if &key == cursor {
            skip_until_cursor = false;
            // Don't continue - we want to process this entry
        } else {
            continue; // Only skip if we haven't reached the cursor yet
        }
    }
}

Problem: If the entry stored in the cursor gets removed from JobSubscriptionBillings (e.g., service terminated, subscription expired), the iteration will skip ALL remaining entries forever because:

  1. The cursor points to a non-existent key
  2. No current key will ever match the stored cursor
  3. skip_until_cursor remains true indefinitely
  4. All subscriptions are skipped permanently

Impact: Complete subscription processing failure until cursor is manually cleared.


🚨 Critical Bug #2: Storage Iteration Order Inconsistency

File: pallets/services/src/lib.rs:1078-1087

Issue: The cursor relies on StorageNMap::iter() having consistent iteration order, but this is NOT guaranteed by Substrate:

pub type JobSubscriptionBillings<T: Config> = StorageNMap<
    _,
    (
        NMapKey<Identity, u64>,          // service_id
        NMapKey<Identity, u8>,           // job_index  
        NMapKey<Identity, T::AccountId>, // subscriber
    ),
    // ...
>;

Problem:

  • Storage maps don't guarantee iteration order across restarts
  • New entries can be inserted in different positions
  • Cursor position becomes meaningless after node restart
  • Race conditions during concurrent subscription creation

Impact: Unpredictable cursor behavior, potential infinite loops, missed subscriptions.


⚠️ High Severity Bug #3: Race Condition During Processing

File: pallets/services/src/payment_processing.rs:512-602

Issue: No atomic processing protection - subscriptions can be modified during iteration:

Scenarios:

  1. New subscription added during iteration → May be processed twice if added before cursor
  2. Subscription removed during iteration → Cursor may point to deleted entry (Bug #1)
  3. Service terminated during iteration → Billing entries removed mid-processing

Impact: Inconsistent billing, double charges, or complete processing failure.


⚠️ Medium Severity Bug #4: Weight Calculation Inaccuracy

File: pallets/services/src/payment_processing.rs:502-506, 525

Issue: Static weight estimation doesn't match actual processing cost:

let min_weight = T::DbWeight::get().reads_writes(5, 2);  // Static estimate
if total_weight.saturating_add(min_weight).ref_time() > remaining_weight.ref_time() {
    SubscriptionProcessingCursor::<T>::put(key);  // May OOG here!
    break;
}

Problems:

  1. Actual processing weight varies significantly (blueprint lookup, balance transfers)
  2. Cursor storage write not included in weight calculation
  3. Out-of-gas after weight check can leave inconsistent state

Impact: Block production issues, inconsistent cursor states.


⚠️ Medium Severity Bug #5: Cursor Cleanup Edge Case

File: pallets/services/src/payment_processing.rs:600-602

Issue: Cursor cleanup condition is insufficient:

if processed_count < MAX_SUBSCRIPTIONS_PER_BLOCK {
    SubscriptionProcessingCursor::<T>::kill();
}

Problem: Cursor should be cleared when iteration completes, not just when under the limit. If iteration ends due to:

  • No more entries (even if processed_count >= MAX_SUBSCRIPTIONS_PER_BLOCK)
  • All remaining entries fail processing
  • All remaining entries are not due for billing

The cursor will persist incorrectly.

Impact: Cursor never cleared, processing stuck.


⚠️ Medium Severity Bug #6: Block Number Overflow Risk

File: `pallets/services/src/payment_processing.rs:564, 567**

Issue: Potential overflow in block number arithmetic:

let blocks_since_last = current_block.saturating_sub(billing.last_billed);
if blocks_since_last >= interval_converted {
    if let Some(end_block) = maybe_end_converted {
        if current_block > end_block {  // Potential issue if end_block wraps

Problem: If block numbers approach BlockNumber::MAX, overflow could cause:

  • Subscriptions processing when they shouldn't
  • Incorrect billing intervals

Impact: Incorrect billing, financial losses.


🔧 Low Severity Bug #7: Error Handling Masking

File: pallets/services/src/payment_processing.rs:587-590

Issue: Payment processing errors are silently ignored:

match Self::process_job_subscription_payment(...) {
    Ok(_) => {
        processed_count += 1;
    },
    Err(_) => {
        continue;  // Silent ignore - no logging
    },
}

Problem: Failed payments provide no feedback, making debugging impossible.

Impact: Silent payment failures, difficult debugging.


🛠️ Recommended Fixes

For Bug #1 (Critical):

// Add cursor validation before starting iteration
if let Some(cursor) = start_cursor.as_ref() {
    if !JobSubscriptionBillings::<T>::contains_key(cursor) {
        SubscriptionProcessingCursor::<T>::kill(); // Clear invalid cursor
        skip_until_cursor = false;
    }
}

For Bug #2 (Critical):

  • Implement deterministic iteration order using sorted keys
  • Add cursor validation on every block
  • Consider using block-based cursors instead of key-based

For Bug #3 (High):

  • Take snapshot of subscription keys before iteration
  • Add atomic processing locks or versioning

For Bug #5 (Medium):

// Fix cursor cleanup logic
let mut iteration_complete = true;
for (key, billing) in JobSubscriptionBillings::<T>::iter() {
    // ... processing ...
    if weight_exceeded || count_exceeded {
        SubscriptionProcessingCursor::<T>::put(key);
        iteration_complete = false;
        break;
    }
}
if iteration_complete {
    SubscriptionProcessingCursor::<T>::kill();
}

The subscription cursor implementation has fundamental design flaws that could lead to complete system failure under certain conditions. Immediate attention required before production deployment.


claude[bot] avatar Oct 20 '25 15:10 claude[bot]

@claude review

1xstj avatar Oct 23 '25 14:10 1xstj

Claude finished @1xstj's task —— View job


Code Review: Full Rewards Lifecycle PR

Review Tasks:

  • [x] Read repository guidelines and setup
  • [x] Review core payment processing changes
  • [x] Analyze rewards pallet modifications and delegator rewards
  • [x] Examine new test coverage and implementation
  • [x] Review subscription cursor implementation (previous critical bugs identified)
  • [x] Check auto-aggregation licensing issues (per review comment)
  • [x] Verify migration requirements for service.rs changes
  • [x] Analyze security implications and edge cases
  • [x] Provide comprehensive feedback and recommendations

APPROVE WITH CRITICAL BUG FIXES REQUIRED

This PR implements an excellent rewards lifecycle system with sophisticated delegator reward distribution and comprehensive testing. However, critical subscription cursor bugs must be addressed before merging.

🎯 Implementation Highlights

Core Fix: Payment Flow

  • Problem resolved: Customer payments now transfer to rewards pallet instead of reserving
  • Enables: Complete payment flow: customer → rewards pallet → operator claims
  • Files: pallets/services/src/payment_processing.rs:396-401

Advanced Delegator Rewards

  • Architecture: Pool-based "accumulated rewards per share" system
  • Complexity: O(1) regardless of delegator count
  • Features: Commission split (15% direct to operator, 85% to delegator pool)
  • Files: pallets/rewards/src/functions/delegator_rewards.rs

Revenue Distribution

  • Split: 85% operators (exposure-weighted) / 10% developer / 5% treasury
  • Safety: Overflow protection, dust handling, mathematical validation
  • Files: pallets/services/src/functions/reward_distribution.rs

Auto-Aggregation Fix

  • Problem solved: Prevents BoundedVec overflow from multiple job calls
  • Before: 50 calls = 50 separate reward entries (overflow)
  • After: 50 calls = 1 aggregated entry per service
  • Files: pallets/rewards/src/lib.rs:911-939

Test Coverage

  • 9 new test files with end-to-end scenarios
  • Real balance tracking with exact money flow verification
  • Edge cases: Overflow prevention, unauthorized access, insufficient funds
  • Quality: Production-ready testing standards

🚨 CRITICAL ISSUES REQUIRING FIXES

Bug #1: Subscription Cursor Deletion Flaw

File: pallets/services/src/payment_processing.rs:514-522

// PROBLEM: If cursor points to deleted key, skips ALL remaining entries forever
if skip_until_cursor && &key == cursor {
    skip_until_cursor = false;
} else {
    continue; // ← This breaks when cursor key doesn't exist
}

Fix: Validate cursor exists before iteration starts

Bug #2: Cursor Cleanup Logic

File: pallets/services/src/payment_processing.rs:600-602

// PROBLEM: Only clears cursor when count < limit, not when iteration ends
if processed_count < MAX_SUBSCRIPTIONS_PER_BLOCK {
    SubscriptionProcessingCursor::<T>::kill();
}

Fix: Track iteration completion properly

💡 Recommendations

  1. IMMEDIATE: Fix cursor bugs before merging
  2. Consider: Add cursor validation/recovery mechanism
  3. Monitor: Subscription processing success rates in production
  4. Future: Consider deterministic storage iteration order

🏆 Overall Assessment

Architecture: Excellent - sophisticated economic model with mathematical correctness Code Quality: High - follows Substrate patterns, comprehensive error handling
Security: Strong - proper authorization, balance validation, overflow protection Testing: Outstanding - production-ready test coverage Documentation: Excellent - detailed inline documentation and examples

This is a high-quality implementation that significantly improves the rewards system. The critical cursor bugs are fixable and should not delay the overall PR approval once addressed.

Recommendation: APPROVE after fixing the 2 critical cursor bugs.


Review completed

claude[bot] avatar Oct 23 '25 14:10 claude[bot]

@claude review, check for bugs, rigorous do not trust anything without thinking and verifying repeatedly.

drewstone avatar Nov 05 '25 16:11 drewstone

Claude finished @drewstone's task —— View job


Rigorous Code Review: Full Rewards Lifecycle

Review Tasks:

  • [x] Read repository setup and guidelines
  • [x] Analyze payment processing changes and fund flow
  • [x] Review rewards pallet modifications and delegator rewards system
  • [x] Examine subscription cursor implementation (critical bug analysis)
  • [x] Review auto-aggregation logic and overflow protection
  • [x] Analyze test coverage and edge cases
  • [x] Check for security vulnerabilities and mathematical correctness
  • [x] Verify migration requirements and compatibility
  • [ ] Provide comprehensive bug analysis with specific file references

🚨 CRITICAL SECURITY VULNERABILITIES FOUND

Critical Bug #1: Subscription Cursor Delete-Race Condition

File: pallets/services/src/payment_processing.rs:514-522

The Problem: When the cursor points to a deleted subscription, the system permanently stops processing ALL subscriptions:

if skip_until_cursor {
    if let Some(ref cursor) = cursor_key {
        if &key == cursor {
            skip_until_cursor = false;
        } else {
            continue; // ← INFINITE SKIP if cursor key doesn't exist
        }
    }
}

Attack Vector: An attacker can:

  1. Create a subscription
  2. Wait for cursor to point to their subscription
  3. Terminate their subscription (causing deletion)
  4. ALL future subscription processing breaks permanently

Impact: Complete DoS of subscription payments until manual intervention.

Critical Bug #2: Storage Iteration Order Dependency

File: pallets/services/src/payment_processing.rs:512

The Problem: Cursor relies on StorageNMap::iter() having consistent iteration order, which Substrate does NOT guarantee:

for (key, billing) in JobSubscriptionBillings::<T>::iter() {
    // Relies on consistent iteration order - NOT GUARANTEED

What Goes Wrong: After runtime upgrades or node restarts, storage iteration order can change, causing:

  • Cursor pointing to wrong position
  • Subscriptions processed out of order
  • Potential infinite loops
  • Some subscriptions never processed

Impact: Unpredictable cursor behavior, financial losses for users.

Critical Bug #3: Arithmetic Overflow in Proportional Distribution

File: pallets/services/src/functions/reward_distribution.rs:179-189

The Problem: Exposure calculation can overflow when many operators with high exposure percentages:

let total_exposure: u128 = service
    .operator_security_commitments
    .iter()
    .map(|(_, commitments)| {
        commitments.iter().map(|c| c.exposure_percent.deconstruct() as u128).sum::<u128>() // ← OVERFLOW
    })
    .sum(); // ← DOUBLE OVERFLOW

Attack Vector: Register many operators each with maximum exposure_percent, causing u128 overflow.

Impact: Panic and service crash or incorrect reward calculations.

🔴 High Severity Issues

High Bug #4: Subscription Cursor Cleanup Logic Flaw

File: pallets/services/src/payment_processing.rs:600-602

if processed_count < MAX_SUBSCRIPTIONS_PER_BLOCK {
    SubscriptionProcessingCursor::<T>::kill();
}

Problem: Cursor only cleared when count < limit, not when iteration actually completes. If all remaining subscriptions fail processing or aren't due, cursor persists forever.

High Bug #5: Weight Estimation Inaccuracy

File: pallets/services/src/payment_processing.rs:502, 525-527

let min_weight = T::DbWeight::get().reads_writes(5, 2); // Static estimate
if total_weight.saturating_add(min_weight).ref_time() > remaining_weight.ref_time() {
    SubscriptionProcessingCursor::<T>::put(key); // ← May OOG here!

Problem:

  • Static weight estimate doesn't match dynamic processing costs
  • Cursor storage write happens AFTER weight check (can cause out-of-gas)
  • No weight accounting for failed payment transfers

High Bug #6: Block Number Arithmetic Overflow

File: pallets/services/src/payment_processing.rs:563-570

let blocks_since_last = current_block.saturating_sub(billing.last_billed);
if blocks_since_last >= interval_converted {
    if let Some(end_block) = maybe_end_converted {
        if current_block > end_block { // ← Overflow if end_block wraps

Problem: Near BlockNumber::MAX, overflow could cause subscriptions to process incorrectly.

⚠️ Medium Severity Issues

Medium Bug #7: Delegator Pool Zero-Stake DoS

File: pallets/rewards/src/functions/delegator_rewards.rs:91-100

if pool.total_staked.is_zero() {
    log::debug!("Operator {:?} has no delegators, reward {:?} not added to pool", operator, reward);
    return Ok(()); // ← Reward permanently lost
}

Problem: When operator has zero delegated stake, rewards are permanently lost instead of being held until delegators arrive.

Medium Bug #8: Auto-Aggregation License Issue

File: pallets/services/src/tests/auto_aggregation.rs:1-4

The test file uses incorrect copyright notice (originated from Moonbeam) instead of proper Tangle license. This needs legal review per the earlier review comment.

Medium Bug #9: Silent Payment Processing Failures

File: pallets/services/src/payment_processing.rs:587-590

match Self::process_job_subscription_payment(...) {
    Ok(_) => { processed_count += 1; },
    Err(_) => { continue; }, // ← Silent failure, no logging
}

Problem: Failed payments are silently ignored with no logging or user notification.

Security Strengths

Payment Flow Fix

  • FIXED: Customer payments now transfer to rewards pallet instead of reserving
  • Enables: Complete customer → rewards pallet → operator claims flow
  • Security: Proper authorization checks (caller == payer)

Mathematical Correctness

  • Revenue Split: Clean 85%/10%/5% (operators/developer/treasury)
  • Overflow Protection: Uses saturating_add, checked_mul, checked_div
  • Proportional Distribution: Mathematically sound exposure weighting

Delegator Pool Architecture

  • Algorithm: "Accumulated rewards per share" - industry standard
  • Complexity: O(1) regardless of delegator count
  • Correctness: Guarantees proportional reward distribution

Auto-Aggregation Logic

  • Problem Solved: Prevents BoundedVec overflow from multiple job calls
  • Before: 50 calls = 50 entries (overflow)
  • After: 50 calls = 1 aggregated entry per service
  • Implementation: Clean aggregation with proper event emission

📊 Test Coverage Assessment

Excellent Coverage

  • 9 new test files with comprehensive E2E scenarios
  • Real balance tracking with exact money conservation verification
  • Multi-payment models: PayOnce, Subscription, EventDriven all tested
  • Edge cases: Overflow prevention, unauthorized access, insufficient funds
  • Quality: Production-ready testing standards

Standout Test: test_full_e2e_native_payment_with_claim() verifies complete money flow with exact balance assertions.

🔧 Migration Requirements

Required Migration for primitives/src/services/service.rs:69

The addition of profiling_data field to ServiceMetadata requires runtime migration to prevent storage corruption for existing services. CC @1xstj @danielbui12 per review comment.

📋 Recommendations

IMMEDIATE (Critical)

  1. FIX cursor delete-race: Validate cursor exists before iteration
  2. FIX iteration order: Use deterministic storage traversal
  3. FIX arithmetic overflow: Add bounds checking to exposure calculations
  4. ADD migration: For ServiceMetadata structure changes

HIGH PRIORITY

  1. Fix cursor cleanup logic to track iteration completion
  2. Fix weight estimation and move cursor write before weight check
  3. Add proper error logging for failed subscription payments

MEDIUM PRIORITY

  1. Handle zero-stake pool rewards (hold until delegators arrive)
  2. Fix auto-aggregation test license headers
  3. Add cursor recovery mechanism for production

🏆 Overall Assessment

Architecture: Sophisticated economic model with mathematical correctness ✅
Implementation: High-quality Substrate patterns and comprehensive testing ✅
Security: Multiple critical vulnerabilities requiring immediate attention 🚨
Impact: Excellent rewards system foundation with production-ready features ✅

This PR represents a significant architectural improvement to the rewards system, but MUST NOT be merged until the critical cursor and overflow bugs are resolved. The subscription cursor issues could cause complete system failure in production.

Recommendation: BLOCK MERGE until Critical Bugs #1-3 are fixed.


Review completed

claude[bot] avatar Nov 05 '25 16:11 claude[bot]